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THE MEASUREMENT OF LIKING AND DISLIKING 
JOHN P. HERRING 


Ohio State University 


GENERAL ORIENTATION 


If we were able to measure amounts of liking and disliking, then 
we should also be able (1) to estimate the potency of human drives, 
making accurate inventories of the objects of human desire; (2) to 
determine the range, complexity and inter-organization of purposes, 
distinguishing between modifiable and unmodifiable ones; and (3) 
to define individual and group purposes in a quantitative manner, 
as to feasibility, methods, limitations, permanence and costs of 
altering ends of action. Powerful tools would become available for 
prediction and control of behavior through prediction and control 
of affection. Motivation in education could be made quantitative 
and could be related to measures of intelligence, maturity, achieve- 
ment, and such trustworthy accomplishment-differences as are already 
made. The aims of the individual could be clarified for his employer, 
for his friends, for himself. 

For the purpose of experimenting with conditioning affective 
responses, I have assumed that solution of problems of measurement 
should precede solution of problems of conditioning itself; that the 
law of large numbers, as exemplified by the use of probable errors, 
should be employed with rigor in earlier as well as later work; that the 
function of descriptive qualititative work is to suggest, for example, 
problems, hypotheses and means of rigorous research, proceeding at 
all stages hand in hand with quantitative work; that conclusions 
such as some of Watson’s on the conditioning of liking-disliking 
ought to be brought to rest.upon quantitative foundations; that 
conclusions resting upon qualitative observation alone are peculiarly 
susceptible to reversal; while those resting upon quantitative observa- 
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tion alone are peculiarly susceptible to meagreness of hypothesis and 
inadequacy of interpretation in respect to complex human contexts. 

In order to make some beginning toward these broad purposes, 
experiments were planned in which the field was limited for a larger 
part of the time to tastes of a few substances and in which the imme- 
diate aims were largely confined to study of measureability of liking 
and to study of changes of response, both when the responses were 
isolated, and when they were accompanied by other affective 
responses; and within this second category, both when there was only 
one accompanying response and when there was more than one. 
Responses to aural and visual stimuli were also studied, the results 
being reserved for the most part for later report. 

Two criticisms, opposite in tendency, have been directed against 
these experiments; one that since a child lives as a whole, the elements 
of a child’s activity should be studied in usual setting instead of in 
that which typifies laboratory procedure; that is to say, that the experi- 
ments were over-simplified: the other, that the experiments were too 
complex; that effects could not be related to causes because there were 
too many unisolable variants in, perhaps, the simplest situation 
obtainable. 

To the first criticism, I would say that I think observation of 
behavior in a natural setting to be necessary at a later stage. It is 
also fruitful to study behavior in settings from the simplest to the 
most complex; so that this criticism may be accepted as emphasizing 
that no stage of experimentation is in itself sufficient. 

The suggestion that the experiments were too complex, is answered 
in Series VIII and IX of the study, in which ultimate simplicity 
was sought, response to a single stimulus being observed for many 
days. 

With the broadest ends in view, then, but working first in terms 
of more limited and immediate ends, ten units of experimental work 
were completed, nine of them at the Institute of Child Welfare Research 
of Teachers College, Columbia University, and one at the Ethical 
Culture School in New York City: 

Series I served purposes of exploration and orientation and enabled 
us (1) to experiment with methods of rating amounts of liking; (2) 
to establish in a preliminary way the reliability of rating; and (3) 
to throw light upon relations of amount of liking at the beginning 
with subsequent change in amount of liking. This series was also 
related to the possibility of conditioning indifferent responses, in 
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either positive or negative direction, an aspect not discussed in this 
paper. 

Series II to be reported later, repeated an experiment by F. A. 
Moss, but with a larger number of children and with measured results, 
on the conditioning of response to a metallic clicker by means of 
tastes; and which also continued and confirmed certain results of 
Series I under somewhat different circumstances. 

Series III added to the foregoing a study of validity of measures by 
correlating ratings with choices; and further confirmed and improved 
reliability of measures. 

Series IV, V and VI which are not discussed here, relate to an 
aspect mentioned in Series I possibility of changing liking to disliking. 

Series VII has been reported by Gauger in “ Modifiability of 
Responses to Taste Stimuli,’”’ published in 1929 by the Bureau of 
Publications, Teachers College, Columbia University, and receives 
further treatment in this paper.! 

Series VIII yielded knowledge of changes in liking for one stimulus 
when all other affective responses were held experimentally constant, 
and also knowledge of effect of a second response introduced after a 
first reached its plateau. This series was conducted at the Ethical 
Culture School in New York City.’ 

Series IX yielded further knowledge of the validity of measures 
by means of a second method: Observation whether the ratings of 
amount of liking for two stimuli were reversed when children’s choices 
were reversed. 

Series X though not discussed here, investigated the possibility 
of turning positive into negative affective response, using white rats 
and noisy clangs. This experiment was somewhat like some of 
Watson’s, except that the behavior was measured. 

In seven of the experiments I was ably assisted by Miss Ruth 
Kotinsky, to whom I am deeply grateful for insight, criticism and 
generous contribution to time. 


Aspects of the ten series not here presented will be made the 
topic of later publication. 





1T am indebted to Miss Gauger for competent cooperation in daily detail 
through Series I to VII. 
2 IT am much indebted to Miss Ellen J. O’Leary for making her Grade I children 


available and for her competent collaboration through a long and tedious 
experiment. 
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SELECTION AND DEFINITION OF APPROACH 


The possible criteria of measures of degrees of satisfaction and of 
annoyance must be found among the following: 


Experience.—Inner feelings of liking and disliking. 

Physiological Change Not Behavior.—Breathing, circulation, secretion, heart- 
beat, etc. 

Behavior.—Movements of acquisition and rejection and related movements. 


The first is difficult as a criterion because it is difficult if not impos- 
sible to measure. 

The second, while having much promise, was not studied in these 
experiments. 

The third, behavior, is divisible in two modes: Observed facts of 
categorical choice between paired stimuli; and observed facts of con- 
tinuously variable degrees of acceptance and rejection. 

Behavior is the basis of these experiments, the first mode therein 
being taken as criterion for the second. 

In the first mode we observe whether an animal takes milk or 
meat. This type of observation tells immediately whether he likes 
milk or meat better; and mediately it may be used to tell how much 
better. 

In the second mode we observe that an animal ran violently toward 
meat, gulping it all down; or that he approached slowly, smelled criti- 
cally and ate sparingly. This type of observation tells immediately 
how much he likes meat; and may be used mediately, by means of 
comparison, to tell which he likes better. 

The interpretation of the facts of behavior in both modes is prone 
to fall into the soft ground of mental states as criteria. The situation 
is apt to seem too simple: In the first mode it is thought that the child, 
remembering his mental states, and thereby knowing which of two 
things he will like better, tells by choosing; and in the second mode, 
the degree of his inner liking is thought to-be represented in facial 
and other movements. This is naive not because it may not be true, 
but because there is no way of showing whether it is true or not. 

The following interpretation, not of course new, is the one used 
in this context. 

A stimulus which an organism likes is defined as one which it does 
something to draw near or to obtain, or as one which it has and does 
nothing to avoid. A stimulus which an organism prefers to another is 
defined as one which it does something to approach or obtain when 
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it could instead approach or obtain the other, or as one which it has 
and does nothing to avoid when it could have the other instead. 

The more frequent forms of response to taste stimuli as they 
occurred in these experiments were such as: Turning toward, reaching 
for, taking, handling, smiling, eyes lighting, saying ‘‘I want some,”’ 
eating without turning away or pushing aside, etc., and doing nothing 
to avoid; and turning away, pushing aside, scowling, spewing out, 
puckering, closing lips tight, shaking head, gagging, vomiting, saying 
“T don’t want that,” saying ‘My mother says I shouldn’t eat that,” 
and doing nothing to approach. 

It is this objective, observable, behavioristic meaning of the terms 
liking and disliking and their synonyms which will be used throughout 
except when context explicitly indicates otherwise. 

The term affective is used throughout only in denotation of liking- 
disliking. 

The meaning of liking and disliking as having reference to sub- 
jective events is thus regarded as indeterminate, or at least, as unde- 
termined for these experiments, and the subjective events themselves 
as very difficult if not impossible to measure. What affective state 
the individual has, how happy he is during a given response, may be 
observed directly only by one observer, the experiencing subject, and is 
therefore socially unidentifiable; and it may be observed only once by 
him, and is therefore unverifiable even within his experience. As an 
unique event, observed by an unique person, it is poor matter for 
science. The subjective aspect seems likely to remain secret. 

The difficulty of scientific treatment of what it is tempting to call 
likes and dislikes themselves may, for all we know, be a great misfor- 
tune, but while inescapable it is one to which society has long since 
made adjustment. Social machinery for attaining ends, never able 
to work directly with states of satisfaction and annoyance, seems 
usually to have assumed that acts of seeking are evidence of affection 
itself. Accordingly an act which is taken to represent affection is 
often accorded the response which it is thought would be incited by the 
liking itself, could that be directly observed. Acts of seeking, rather 
than states of affection, are the subject of legislation and judicial 
decision, the basis of institutions, and to an increasing degree, the con- 
tent of ethics and of psychology. Even were it possible to have 
immediate knowledge of subjective states in others than ourselves, 
it might still often be desirable to reserve praise and blame for overt 
acts. 
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In defining likes and dislikes as we have done, and in determining 
to deal only with overt acts of which there may be many observers, 
we are the more likely to draw conclusions which may be used by 
teachers, legislators, judges, producers, merchants. So far therefore 
from being handicapped by failure to measure the thing which is really 
interesting, it turns out that we ought to be especially interested in the 
thing it is feasible to measure,—behavior. 

One implication of technical importance to investigation may be 
taken from the foregoing: Since we are held from observing pleasure 
itself, which it is both alluring and naive to think feasible as a crite- 
rion, it is the more imperative to employ two independent methods 
of measuring the relevant behavior. In these experiments the two 
methods were observation of choices between stimuli presented two 
at a time; and the observation, by two and sometimes by three observ- 
ers, of degrees of approach and retreat. 


NECESSARY CONDITIONS 


In order that the processes of change in the amounts of liking and 
disliking might be observed over a long period of daily stimulation of 
young human subjects, certain conditions had to obtain. 

The range of disagreeable stimulation must be suitably limited so 
that neither parents nor children would stop experimentation. 

It was necessary to isolate a given affective response so that it 
could be watched without the admixture of other such responses; 
that is, it was necessary to study the responses each in an affective 
vacuum. 

It was necessary also to study instances more nearly under the 
conditions of usual life, that is, not in affective vacuua, but in environ- 
ments in which more than one affective phenomenon was present. 

And, with regard to technical method, it was necessary to measure. 
This involved the conventions of reliability, objectively and validity 
of measures, but did not go so far as to investigate the equality of 
units, and the location of reference points of an absolute character, 
with the exception of the point of zero liking and zero disliking. 


EXPERIMENTAL PROCEDURES AND LABORATORIES 


To these ends a number of experimental procedures were devised, 
ach, after the first, to a degree the result of experience with preceding 
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experiments. The following description of procedures and labora- 
tories is typical of all the experiments. 

The laboratory in which the experiments took place was a school 
room from which all school furniture had been removed. Screens 
were so placed that the experimental subjects could see, say, only a 
third of the room, could not touch the screens during the experiment, 
and could not see the table bearing the materials of experimentation. 
Within the approximate semi-circle formed by the screens was a child’s 
chair and table, fastened to the floor to avoid distractions of movement. 
The chair faced light from large windows, which had no direct sunlight 
during the experiments, or when having it, had shades drawn so as to 
maintain approximately uniform lighting from day to day. About 
six feet in front of the child’s chair and table were two chairs for adult 
observers, who sat facing the child and backing the light. The child 
was thus prevented from seeing the observers very clearly, while 
the observers had good light uponthechild. The adult,G, who admin- 
istered stimuli, moved back and forth between the child and the table 
beyond the screens. 

A child was seated in the chair before the table. Previous trials 
had resulted in the elimination, by removal or concealment, of distrac- 
tions which most incited responses interfering with experimentation. 

In Series I this end was accomplished by a gradual elimination of 
distractors of attention from day to day, as the sources were disclosed. 
In Series II a more studied elimination was made at the beginning, 
and in Series VII the most rigorous elimination of distractors of 
attention-affection was instituted by means of experimentation with 
the laboratory environment for several days before the beginning 
of the series. This served also to accustom children to the surround- 
ings beforehand, as well as to inform the experimenters how they must 
govern their own facial expressions, words, and other behavior in 
order to elicit from subjects only neutral or zero affective responses. 

On the table before the child were laid, for example, a measured 
amount of water and of sugared orange juice in containers, one to the 
right and one to the left. The subject was given a standard amount of 
water from a spoon and then after about a minute a standard amount 
of orange juice. As the subject responded to the tastes, two or three 
judges estimated the amount of seeking or avoiding shown in the 
behavior, in a numerical scale from minus thirty to plus thirty. 
Descriptive notes were taken, but not as fully as would be demanded 
had the methods been primarily descriptive instead of primarily 
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quantitative. One purpose of descriptive comment was to collect 
material illustrative of the meaning of numerical estimates. In 
Series VII, immediately after the first pair of tastes were given, G 
said to the child: ‘‘ Which one do you want? You cannot have both. 
You may just have one.’”’ With most children ‘‘Which one do you 
want?’’ was sufficient, and often choice was made before words were 
spoken. If necessary at first the words were repeated to bring response, 
and in rare instances other words were extemporised. Children who 
did not learn what was intended from speech alone usually learned 
during the course of one or two ten-minute periods of experimentation 
from speech combined with action. One or two children were aban- 
doned as experimental subjects for not learning to choose. 

When the child chose, the choice was recorded by G, the child 
was given the taste he chose, and quantitative estimates of amounts 
-of liking or disliking were made for each event and were recorded by 
the judges. 

Then other pairs of stimuli were given in the same manner. The 
stimuli in Series VIII, for instance, were taken from the following 
tastes: Water, orange juice, chocolate, sugar, honey, raisins. Let 
each be represented by its initial. The pairs were then, ch, cr, cw, 
hr, hs, hw etc., as shown in Table VII. 

No measures of the degree of agreeableness of these tastes in the 
case of children or of adults were available at the beginning, except 
for raisins, which had, in a previous experiment, been liked about +6 
on the average by sixteen children of roughly similar selection. So 
H, K andG, who were judges, tasted a variety of substances repeatedly 
and ranked them for agreeableness as best they could, allowing their 
judgment to be influenced somewhat by their ideas of children’s 
relative likes and dislikes. It may be that we thus obtained a selection 
which better suited some of our purposes. We wanted pairs, some 
members of which differed widely in their amounts of agreeableness, 
some which differed little, and some which differed in a middling 
degree. These ends were accomplished. We also wanted a wide 
range of agreeableness, without using annoying tastes, an end accom- 
plished to the extent shown in Table XI. 

Precautions were observed in the administration of stimuli, placing 
now to the right and now to the left in a predetermined random order; 
administering first now this and now that in a predetermined random 
order; arranging the events in a certain serial circular order of presen- 
tation, used for convenience for all children, but varying the place of 
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beginning in a predetermined random order. Thus clues by which 
judges could tell what taste was being given were studiously avoided. 

The substances to be tasted were concealed from judges H and 
K, while G brought and administered them, so that H and K were 
as far as possible without clues to the identity of stimuli. This con- 
cealment was imperfect, however, since the manner of receiving and 
chewing were at times revealing. H and K tried not to give attention 
to stimuli, but only to responses. Evidence of success consists not 
only in their own impression that they were usually able to do so, 
but also in the probable errors of estimate of true scores of the thing 
estimated for the three pairs of judges. For instance, the average 
PE+/1 — r for judges H and G for columbo and for raisins, and also 
the average for H and K for the same stimuli (being the four probable 
errors of estimate involving H) averaged 1.5 points. The same aver- 
age of 1.5 was obtained for G and for K. This is computed from the 
data of periods three to six of Table IV, the last two-thirds of Series I. 
Substantial equality of these measures of reliability-objectivity of 
judgment indicates that G, who knew the identity of every stimulus, 
since she prepared and administered it herself, judged no more con- 
sistently than those who certainly did not know the identity of most 
of the stimuli and made every effort to avoid knowing any. There 
was an equality of PE./1 — r for H, K and G, obtained both in Series 
I and in Series III, in the first of which the three judges were not 
prevented from knowing what stimuli were used, while in the other, 
one judge knew and two very seldom knew. 

For this there may be a number of reasons: (1) the judges reflected 
that the same stimulus may be liked in differing degree at different 
times; (2) that a child may inhibit differing amounts of the behavior 
of accepting and rejecting on different days; (3) that when it was known 
what stimulus was given, and when H and K both thought this knowl- 
edge had affected the judgment of degree of liking, it was sometimes 
immediately found that the unhesitating choice of the child contra- 
dicted the judgment; and (4) that a stimulus might be liked more or 
less according to what taste had preceded it. Such considerations 
led the judges naturally to mistrust any means of judging except those 
derived immediately from movements of facial and bodily muscula- 
ture, watering of the eye, etc. 

It was always left to the judges to decide in what degree and in 
what aspects each response was relevant in determining an amount 
of liking. In the first series the judges tried to judge reliably and 
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validly by discussing particular judgments and observations daily 
after the experiments. In Series II to [X there was very little of such 
discussion because very little seemed to be needed, and it was of course 
better to rule it out as soon as it was no longer needed. 

Judgment of the amount of liking-disliking was expressed in a 
number series from —30 through 0 to +30, a series intended to 
correspond, as nearly as we could guess, to the entire gamut of pre- 
kindergarten affection in normal living, from —3 SD through 0 to +3 
SD; that is, from the most disagreeable experience in young normal 
life to the most agreeable. Each judge used three reference points; 
extreme disliking, indifference, and extreme liking; and in addition, 
during very early work, related his judgments to those of other judges 
by means of daily comparison. 


RELIABILITY-OBJECTIVITY OF MEASURES 


Reliability and objectivity are put together with a hyphen because 
the coefficients presented depend on correlations between the judg- 
ments of two judges. 

The evidence for the reliability-objectivity of the measures is in 
general adequate for the uses to which they are put. 

In Series I probable errors of estimate for judging affective response 
to raisins and columbo, as judged by H, K and G are shown in Table I. 
(See end for all tables.) 

Table II shows an average probable error of estimate of 1.0 for one 
child for six different stimuli of Series VII, in which n = 31, (each 
scatter-gram entry corresponding to one day of experimentation). 

Table III shows a sampling of correlations, etc., between judgments 
of K and G in Series VII in the case of eight different stimuli, one 
correlation for eight stimuli occurring each day for twelve different 
days. The probable errors of estimate range between 0.3 and 1.2 
units in the scale of sixty units from —30 through 0 to +30, averaging 
0.6. Ther’s range from .87 to .99 in a range of affection varying from 
49 to 16 of the scale. 

For Series III the probable errors of estimate, based upon correla- 
tions between judged responses to stimuli which were repeated after 
one or two minutes, are shown in Table XII. They averaged 0.77 
for six stimuli: Water, orange juice, sugar, raisins, honey and chocolate. 

For Series I the standard errors of estimate based upon judgments 
of H and K in six equal successive periods, together comprising the 
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whole series, are as shown in Table V. They averaged 2.7; that is, 
an average probable error of 1.8. 

For all the experiments taken together, the PE +/1 — r of the thing 
measured was about 0.9 in a range of sixty points. Making the very 
excellent assumption of equality of standard deviations in the case of 
reliability-objectivity coefficients, the standard error of a difference 
is equal to +/2 times the standard error of estimate of a true score. 
That is, ~/(SD)? + (SD)? — 2r(SD) (SD) isequal to+/2 (SD) 7/1 — r. 
In general, therefore, the PE of differences between one judgment of 
one judge and one of another averaged about 1.3. 

There is ground to believe that in series after the first two, G and 
K were doing as competent judging as is reasonable to expect in such 
circumstances. They had judged together daily for one academic 
semester before beginning Series VII. Their probable errors of esti- 
mate were equal and remained so from early in Series I. They had 
diminished these coefficients as time went on, and had reached a 
plateau. 





VALIDITY 


The problem of validity as here treated involves certain assump- 
tions. It is assumed that children’s choices between paired stimuli 
are a better criterion than the judgments of adults who observe 
the behavior of liking and disliking. It seems difficult to conceive 
a better criterion than choices. Subjective states of affection are 
not easy of approach by direct measurement; and in the case of 
indirect measurement of them it is likely that choices would be the 
best criterion. Even if subjective states could be exactly measured 
by some means, it is likely that choices would be estimated of at 
least equal importance, and if not so estimated, then certainly of 
very great importance. 

It is also assumed that the categorical facts of choice, had they been 
measurable in continuous degrees of amounts of ‘liking-disliking, 
would be normally distributed. That is, when a and b are presented 
many times as alternatives for choice, and a is chosen any given per 
cent of the time, then if the amounts by which a was liked better than 
b had been measured in all the events these amounts would have been 
distributed normally. 

It is further assumed that when children made no choice their 
choices, had they made them, would have been distributed between 
the categories of acceptance and rejection as were those they did make. 
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But, of the 502 choices of Table VIII, there were only twenty cases of 
no choice. 

Table VII exhibits relevant data and constants: bi-serial r’s 
between the categorical facts of choice and continuous variables of 
judgment of amount of acceptance and rejection,—the thing chosen 
being paired, in the correlations, with judgment of response to the 
thing chosen. These bi-serial r’s are twelve in number, have a mean 
of .34, a range from .06 to .77, and an SD of the distribution of r’s of 
.20. They are based upon 1944 judgments by H and 1944 by K, 
each unit in the body of the table being a mean of one estimate by H 
and one by K, paired with the fact of choice. These correlations are, of 
course, not directly comparable with each other, being computed from 
data of different dispersion. The bi-serial SD+/1 — 7’s rest, as do the 
bi-serial r’s, upon the assumption that the trait measured categorically 
is in reality continuous and normally distributed. Being independent 
of range, they are directly comparable. They appear in the last 
column of Table VII, with a mean of 1.5, an SD of 0.27, and an SD, 
of the mean of 0.06. Taken together, they are statistically reliable. 

In order to test the validity of judgment by comparison with chil- 
dren’s choices as a criterion, the following method was used to deter- 
mine, upon the basis of choice alone, the average amount of liking 
for each stimulus. 

Assuming, for instance in the case of water and orange juice, a 
normal distribution having a dichotomy so placed that p is thirty-one 
per cent and q is sixty-nine per cent instead of the categorical distri- 
bution given in Table VIII, read from the Kelley-Wood Table the 
mean deviation, 1.14 SD, of the tail containing thirty-one per cent of 
the measures, and similarly for the mean deviation, 0.51 SD, of the 
tail containing sixty-nine per cent of the measures. The difference 
between the mean deviation of the two tails is an estimate of the 
amount difference of liking for w and o. In other words, the average 
distance of the measures of liking for orange juice from the mean of 
the total distribution of choices and rejections of orange juice was 
found, and similarly the average distance of the responses to water 
from the mean of the total distribution of rejections and choices of 
water. These two distributions have identical SD’s since the cases of 
rejection of one taste are those of acceptance of the other, and vice 
versa. Orange juice was farther from the mean than water, the amount 


by which it was farther away (0.63 SD) being a measure on the average 
of the difference of liking for the two. 
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The differences for the other eleven pairs of tastes were determined 
in the same manner. 

Now it is possible, by combination, to find several estimates of the 
same distance. Knowing, let us say, distance AB by direct measure 
as above, and also knowing distances AO and OB, we can by addition 
obtain a second, indirect, partly independent measure of the distance 
AB. A number of such indirect, partly independent estimates appear 
in Table IX, together with the direct estimate. This is the case of 
water and honey, the distance between the two in the affection of these 
children being estimated at 0.81 SD. 

The first, third, and fifth of the differences needed (Table X) are 
measureable as in Table IX, compositely, by averaging together 
a direct measure and four indirect measures of the same difference. 
But the second and fourth of these differences, that between orange 
juice and honey, and that between sugar and chocolate, had only indi- 
rect measurements like the indirect measurements of Table IX. The 
results are given in Table X. 

Giving water an arbitrary value of zero, and adding successive 
differences, without introducing any correction for variation in stand- 
ard deviations of the things added, we obtain, as seen in Table XI, 
a series of values showing relative placements derived from the data 
of choice alone and not at all from the data of judgment. 

Table XI also shows the average amount of liking for each stimulus 
as judged, which in turn is, in its derivation, independent of the facts 
of choice. 

The correlation between these two series, thus independently 
obtained, is +.93. This is a Pearson product moment correlation of 
validity of judgment on the average, with choice on the average as a 
criterion. 

In order to correct the result for errors of measurement we must 
have knowledge of the reliability both of judgment and of choice. The 
reliability of the means of the judgments may be estimated from the 
data of responses to stimuli which were repeated within three or four 
minutes. These stimuli were the ones that went into the making of 
the scale of amounts of liking based upon choice. This correlation of 
.98 is shown in Table XII, and the standard deviations associated in 
it are just about equal to those associated in the validity coefficient. 
We may surmise that this estimate is too high as a measure of the 
reliability of mean choices but if it is so, it is for our purpose conserva- 
tive. The validity coefficient of .93 is therefore probably subject to 
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upward revision on account of errors of measurement, and is perhaps 
to be quoted at +.95. These means, therefore, have a degree of valid- 
ity as well as of reliability-objectivity. There is lacking of course a 
knowledge of other correlations determined in known ranges of affec- 
tive behavior and by other investigators, with which to institute com- 
parison; and likewise there is lacking any conventionally determined 
standard range for the purpose of comparison of correlations, as for 
instance Grade IX, or the twelve-year-old group, for the comparison 
of correlations of mental traits in children. As nearly as can be told 
the total range of affection occupied by these means is not above ten 
per cent of the total range of sixty points. 


SECOND APPROACH TO THE MEASUREMENT OF VALIDITY 


It seems possible that even if judgments were valid and reliable 
with new subjects at the beginning of an experiment, they might be 
less so after different amounts of learning to like or to dislike had taken 
place. The point has importance because demonstration of the fact 
and amount of change is dependent upon the possibility of holding the 
_ validity and reliability of measures throughout the progress of an 
experiment. Judgments, in order to be and remain valid, must vary 
concomitantly with choices. 

A special case of such concomitant variation is that in which judg- 
ments tend to be reversed when choices are, and not reversed unless 
choices are. An experiment was designed for the purpose of making 
this test. 

It was desirable to this end to find stimuli differing in degree from 
each other enough to make a large majority of children prefer the same 
one at the outset, and at the same time differing so little that a large 
majority, which at first liked one better than the other, could be taught 
to reverse this order of liking. The selection of stimuli for such a com- 
pound end is a nice matter: because, erring toward too great difference, 
it jeopardizes reversal of choice as an outcome; and erring toward too 
little difference, it endangers possibility of discrimination. 

Thirteen nursery school children were found who, like most of 
the children of previous experiments (particularly of Series VII) chose 
a salt solution as against diluted vinegar. It was our purpose to cause 
the children to like vinegar as much as possible, and if possible better 
than salt. This end was attempted by giving vinegar to the subjects 
daily, or nearly so, accompanied by chocolate given before and after 
the vinegar, until the disliking for vinegar was reduced to a constant 
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amount and resisted further attempts at reduction. Salt was, of 
course, not administered during this period. Then, after each subject 
had come to his plateau and remained there day after day, salt and 
vinegar were both administered, children’s choices between them 
being recorded for three to five successive experimental periods. Two 
experimenters, G and B, measured, by judgment in the manner pre- 
viously described, both at the beginning and at the end, the amount of 
liking for the two tastes. 

The result was a change of choice as made on the part of the group 
from salt to vinegar, there being at first thirteen to none for salt, 
and at last eight to one for vinegar, with the four remaining children 
changing from salt to uncertainty, choosing now one and now the other. 
Preferences as rated changed in the same direction, from salt toward 
vinegar, but not as far: at first there were eleven to none for salt, two 
being uncertain, and at last six to six, and one uncertain. 

All judgments in the first and last periods were of two observers, 
G and B, excepting that at the initial judgments five of the children had 
only one judge, B. G had the longest experience of all the observers 
associated with me during the two years, and B as little as any, namely 
a brief training period. Accordingly B’s ratings were corrected so as 
to have their standard deviation and mean equal to those of G. This 
was done separately for salt initially, salt finally, vinegar initially and 
vinegar finally. This refinement, however, made no difference in 
any final comparison. 

The PE+/1 — rez for this series (IX) was 1.2 in the scale of sixty, 
and the PE./1 — r for the two observers taken together was 0.5. 
This latter is relevant to the judgments in periods I and III. The 
corresponding PE for Period II, in which B judged alone was proba- 
bly about one point, though no conclusion rests upon this fact. The 
PE of the difference between the means of S; and those of V;, — 0.16 
— (1.30), in Table XVII, is 0.25. 

The mixture of two taste stimuli upon the tongue; a varying por- 
tion of one remaining throughout the one-minute interval between 
stimuli to mingle with the next stimulus, may, for all we know, have 
had a variable effect both upon choices and upon other observable 
behavior, and consequently upon the judgments of amount of liking 
for the second stimulus. For this reason, no single measure, either 
by choice or by judgment, is presented as a measure of amount of lik- 
ing for one stimulus by one child at one time. But the important 
point is that such averages as were obtained were of a number of meas- 
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ures taken on different days, from a number of children, and as such 
would tend to eliminate chance fluctuations of this sort. The assump- 
tion appears excellent that these variations, if they exist, would be 
distributed in a random manner throughout the duration of the experi- 
ment with the thirteen children taken together asa group. There were 
probably variations in the time interval, in the humidity of the sur- 
face of the tongue, in the placement. of the liquids in the mouth—a list 
which could be augmented. However, correlation between such 
elements and change in amounts of liking from Period I to Period III 
seems unlikely; and the assumption of chance variation reasonable. 
Such factors could not be called upon to account for the reversal of 
choice or of judgment. Further, it seems likely, both from watching 
children and from adult experiences in tasting as the children tasted, 
in the same experimental setting, that if mixture occurred constant 
effects were small. 

Table VI exhibits the facts as to the correlation between tendency 
for judgments of amount to be reversed when choices are reversed 
and not otherwise. The correlation expressing this tendency toward 
reversal is +.59, with a PE of 0.18. 7 is 13. 

This experiment had its limitations. The number of children was 
too small and the range in which these measures were compared and 
correlated was very small, the SD1/1 —r of ratings for salt being 
about 0.9 and for vinegar about 1.4. Inarange having anSD1/1 —r 
of ratings for salt of 3.5,an r of +.59 becomes an R of .9+-, a figure com- 
parable with the R’s of Table XVIII. We may surmise that had we 
been dealing with differences averaging above 3 the correlation would 
be above .9. It seems likely that we could in another such experiment 
come out at the end with much larger differences by the simple means 
of using a much stronger solution of vinegar, or one of powdered 
columbo, and as a control taste a somewhat stronger solution of salt. 
Let the salt be as strong as can be, and at the same time be almost 
always chosen by 100 children as against columbo. Now, from Series 
VIII especially, with corroboration from Series I and VII, we already 
know that responses of dislike as extreme as —10 can be reduced 
nearly to zero. Response to strong salt solution may be expected to 
remain unchanged, since it is not repeated until the moment of final 
comparison. Consequently we would expect at the end larger differ- 
ences between salt-liking and columbo-liking. 

The case cannot rest on thirteen children and an r equal to .59, 
uncorrected for smallness of n. But the results accord with those of 
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the other and independent approach to the problem of validity. The 
correlation is at least as high as the close discrimination involved 
would allow us to expect. The method seems capable of extension 
and promising of conclusive results. 


ISOLATION OF RESPONSES IN AFFECTIVE VACUUA 


It was my purpose in Series VIII to simplify the experimental situa- 
tion to an extreme degree; to come as nearly as possible to observation 
of one response by itself. If we are to study experimentally the effect 
of a single affective response, R2, upon another, R, all other affective 
responses must be held constant. Then if the two responses are caus- 
ally associated, one will vary when the other varies and the relation- 
ship of concomitant variation will become discernible. 

Other things equal, it is better to render irrelevant factors in fact 
constant, than to estimate by means of partial correlation the probable 
results of actual control. Now the zero point between liking and dis- 
liking is the most readily useful as a point at which to hold affective 
response constant. It is the only point in the scale which has absolute 
determination and which is in the long run probably both correctly 
and meaningfully placed. It is likely in this context that the SD of 
zero is less than that of any other magnitude. Doubtless it is better 
to have other affective responses absent than to have them constant at 
+10 or —10 or anywhere else. Probably also the zero condition is 
easier than any other for an experimenter to maintain in his subjects, 
especially in view of the tendency of every response studied to move 
toward zero with the passage of time and repetition. 


THe Zero Pertop or Series VIII 


An effective vacuum is, of course, not environment without stimulus 
and organism without response. It is rather environment-organism 
differing in one important respect from the usual. A child sits in an 
accustomed room. An adult’s face makes him smile pleasantly. The 
ratings of three adults are +4, +6, +6. On the next day the adult 
avoids looking at the child, who notices the avoidance. Ratings: 
—1, —2, —4. On the next day the adult succeeds in so acting that 
she gets no affective response.- Ratings: 0, 0,0. Similarly for every 
affective response that appears. Ratings come to be zeros day after 
day with great uniformity. The affective vacuum is established, but 
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it has to be maintained throughout the long, often very tedious Series 
VIII. Every observed sign of liking-disliking for anything during 
the whole series was thus rated, and its object thus removed or modi- 
fied, if possible. So far as a certain conclusion of the experiment is 
true, it is also likely that such affective responses as escaped observa- 
tion and elimination became smaller and smaller as the days passed. 

It is not for a moment to be supposed that the vacuum is a nearly 
perfect one. Besides unobserved and unsuspected affective responses, 
there may be a constant series of such responses not purposely intro- 
duced by the experimenter, having some potency opposite in sign to 
those purposely introduced, and following regularly and immedi- 
ately upon them: Pleasure may be followed by displeasure that 
pleasure is gone, and pain by pleasure that pain is gone. 


First Periop or Series VIII 


Having such a partial affective vacuum established at the Ethical 
Culture School with children of a first grade, a variety of tastes, some 
pleasing and some displeasing, some mild and some strong, were 
selected by means of measures obtained in previous series. These 
were chocolate, salt, vinegar and the like, as listed for Series VIII in 
Table XV. Each child came daily and tasted one and only one sub- 
stance for weeks, some tasting chocolate, some salt, etc. 

The changes in response of the different children showed uniformity 
in several important respects. 

First Uniformity.—The children liked tastes, if pleasant, less and 
less as the days passed; and if unpleasant, disliked them less and less. 
There was not one exception. Table XVIII shows rjc and Ryc after 
correction for dispersion (purely for the sake of comparability) between 
the amount by which a taste was liked at first (7) and the amount of 
its change toward zero (C). ric is —.99 and Ric is —.95. (Series 
VIII, first period, Table XVIII.) 

Second Uniformity—These stimuli were continued, each child 
having the same stimulus daily, until it was judged that a plateau 
was reached in each case. The evidence for the existence of plateaus 
is seen in Table XIV in the last few entries in each row, that is, for 
each child; and seen again in Table XVI in the averages for all individ- 
uals for the last eight days for positive and negative stimuli sepa- 
rately as compared with the entries in Table XV for the first days for 
positive and negative responses separately. 
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Third Uniformity—None of these individual graphs crossed 
the zero line during the first period even for a single day; never did 
liking turn to disliking, nor disliking to liking. I have no evidence 
that if events of the first period had continued in the same manner 
for weeks or months longer than they did, they might not have brought 
instances of crossing the zero line. I know of no reason to think that 


they would. None did in the time allowed, and all apparently reached 
plateaus. 


Seconp PrEriop or Series VIII 


Responses, having come to a standstill at the end of the first period, 
were observed during the second period with one and only one circum- 
stance altered. A second stimulus was introduced, the stimulus of 
the first period being continued and the new stimulus being given a 
minute before and again a minute after the other. A child who tasted 
honey throughout the first period, was given vinegar, then honey, 
then vinegar, about a minute apart, throughout the second period. 
Always the new stimulus was opposite in sign to the old, one being 
agreeable and the other disagreeable. 

Upon the introduction of this second stimulus, the plateaus were 
broken. The most striking observation is that nine of the thirteen 
children crossed the zero line at some time, and five of the thirteen 
crossed it and remained somewhat consistently on the other side. 
In the case of positive stimuli like honey, raisins, and chocolate, the 
average moved from +2 at the end of the first period to +1 at the 
end of the second; and in the case of negative stimuli like vinegar, 
salt and egg white, the average moved from —2 at the end of the 
first period to —4 at the end of the second. The suggestion is strong 
that these movements of both positive and negative averages toward 
zero, and of certain individuals beyond zero, were associated with the 
introduction of a second stimulus of sign opposite to the first. I call 
this conditioning, and I define affective conditioning as change in 
amount of liking for a stimulus R,, which shows concomitant varia- 
tion with change in amount of response to some other stimulus, Re. 
Correlation is a measure of the closeness of concomitant variation. 
In this instance, R,; is the response of the first period which was con- 
tinued through the second; Rz is the new response introduced at the 
beginning of the second period:into an affective vacuum which, apart 
from the presence of the single response already purposely evoked, was 
as perfect as we could make it. Table XV shows that the more pleas- 
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ing Re. as compared with R, if negative, and the more displeasing R, 
as compared with R,, if positive, the greater is the change in R;. The 
differences between R, and Re, (at the beginning of Period II) are 
correlated with the amount of change in R,, both being taken posi- 
tively as distances. In Table XV, that is, the difference between the 
two columns I of the Second Period is paired with the first column C 
of the Second Period, the pair in the first row being [4 — (—8)], —4, 
and in the second row [7 — (—8)], —7, etc. Subject Number 9 was 
omitted as having too short a second period and as not having yet 
significantly begun to change, but the omission affects the correlation 
very little. The inequality of length of the second period in the 
remaining cases may have operated to reduce the correlations reported, 
which are perhaps therefore conservative. The correlation between 
difference between R, and FR, at the beginning of the second period, 
and the change in R, is +.7. 

The case cannot be rested on twelve subjects. These experiments 
are, of course, preliminary. But the results are positive and with 
further invention they are promising of conclusive results. 

The total case for R. having changed R, is strengthened by the 
following considerations based upon the data in Dr. Gauger’s thesis, 
as well as upon the tables given hereafter: 

It was not judged that a plateau had been reached until as many 
experimental days had passed as it took to initiate the more retarded 
plateaus in Series VII, none of which ever showed signs of breaking. 

In no case did FR, cross zero before PR, was introduced. 

After R. began, five crossed zero from negative to positive, only 
two of the negatives, both apparently too brief, failing to do so. No 
responses originally positive crossed zero. Eight R,’s and their R»’s 
came together, but none crossed each other, so that things as far apart 
affectively as eight to fifteen units came, in the course usually, of about 
twenty experimental days after introducing Re, to be liked equally 
well. 

' No R, ever crossed its R: either in this series or in Series [X. 

Corroborative also are correlations given in Table XIII, Series VII. 


CHANGE IN AFFECTIVE RESPONSE 


No instance of sustained movement away from zero was observed 
in any series in any child. No instance of response held constant 
from the beginning was observed in any series in any child. Every 
graph turned soon toward zero, both from above and from below. 
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The high degree of uniformity in this respect is measured in Table 
XVIII, showing numerous correlations between the positive or nega- 
tive strength of affection at the beginning and the amount of change. 
Without exception, if Ri was plus, the change was minus or down 
toward zero, and vice versa. This was true for ten different stimuli 
and for about fifty different children. It was true in as good an affec- 
tive vacuum as I could make, and it was true when Re, R; etc. were 
purposely introduced to study their association with R,. Only one 
correlation of this type in an affective vacuum was obtained, R = 
— .95, that in Series VIII, stimulus 1, first period, Table XVIII. Four 
such correlations, R = —.98, and R = —.93, Series VIII, second 
period; and R = —.63 and R = —.62, Series 1X, were obtained with 
only R; and R2 purposely introduced, other R’s being purposely 
excluded. The other more numerous correlations in Column Rjc, 
Table X VIII, are from series in which more than two affective responses 
were sought, 3, 4, and 8 responses being associated. 

We are therefore held from saying that in Series I, II and VII the 
change in R, was due to the presence of Re, R; etc. We may suspect 
that Ro, R; etc. were not without their influence since the introduction 
of Rz in Series VIII appears to have affected a further change in R. 
It would seem possible that an R2 introduced at the beginning (rather 


TaBLe I.—Series I. AVERAGE OF PE,/1 — r’s For THE Last TWO-THIRDS OF 
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than after R, had reached its plateau as in Series VIII) would bring 
about both a faster and a greater change in R;. Of this however we 
have no evidence. 


TaBLe III.—Series VII. CoErFriciENTs OF RELIABILITY-OBJECTIVITY FOR ONE 
Suspsect FoR Er1cut DIFFERENT STUMULI ON Eacu oF TWELVE DIFFERENT 








Days 
| | + . | 
Day | m (stimuli) | oT | TGR PE/1 —r 
} | 
! | 
1 8 6.50 | 95 | 0.98 
2 8 4.91 | 87 | 1.19 
3 8 5.90 | 95 0.89 
4 | 8 5.16 | 97 0.60 
9 8 7.22 .99 0.49 
10 8 6.70 | 98 0.64 
20 8 5.10 | 99 | 0.34 
21 8 5.58 | 98 | 0.53 
22 8 4.96 | 98 0.47 
29 8 5.10 98 0.49 
30 8 2.84 | 96 0.38 
31 8 3.27 | 98 0.31 
Average....... 0.6 





This table shows improvement in judgment during experimentation from a PE 
4/1 — r of about 1.0 to one of about 0.4. 


TasLeE 1V.—Series I. SD+/1 — r ror Eacu or Six SuccessivE Periops 





1 2 3 | 4] 5 | 6 | Averages 


NE fot gee tae 4.2/3.7/1.8/1.8/1.5/2.6| 2.6 
ET OE IR ae Se 519: 9 | 3.1/3.5 | 1.5/1.8 2.7 
NE ee me 3.9/3.3 | 2.5/2.7 1.5 | (2.2! 2.7 





The improvement in reliability-objectivity of judging (judge correlated with 
judge) resulted in nearly halving the average amount of chance error during the 
process of the six periods from beginning to end of Series I. 








TABLE V 
Series | annie still —r 
I | 1.8 
Vil | 0.7 
VIII | 0.8 




















ES RAR Ae sik Spel tase) pAlb O UI noc D> 








henna y 


nas has) SOR 


a cco 


we 22 
a dante 


Ade siltehcal OAV tnd SH; 


ache naea 











ai NR aber stay ct aS ees <a ee aN ee 
2 SORE Rana ugi a AAS RENO Rn PORE OT 6) Kinin TAO Cae eA 


Liking and Disliking 181 


CONDITIONED RESPONSE AND NEGATIVE ADAPTATION 


I use the words conditioned response to mean change in amount of R,; 
correlated with amount of R2; and negative adaptation to mean change 
in amount of R,; not correlated with amount of any R.. It cannot be 
said that changes in R, in the partial affective vacuua of Series VII 
were instances of negative adaptation, because of the possibility that 
the influence that brings an R, upward toward zero is pleasure that 
pain is gone; and downward toward zero, pain that pleasure is gone. 
What would happen if the children had had all they wanted of choco- 
late or raisins cannot be said. They had very little, e.g., chocolate 
as big as a pea. So that, for all we know, what might appear at first 
sight to be negative adaptation is really conditioning by an Rz which 
was neither of our own willing nor within our means of observation. 


CHANGE IN R&; AND SURVIVAL 


Whatever the relationships of mutual implication may turn out 
to be, the most undoubted single uniformity in all of the experimental 
series is that of change toward indifference. This may be a basic 
trait in surviving organisms. That organism would tend to survive, 
the theory might run, which tended toward growing economy of 
response to situations as they became familiar, in favor of a reserve 
of response for novel situations, and hence an increased repertory of 
response. It is perhaps more satisfying to turn from situation to 
situation than to remain content with monotony. Judging from the 
results of Series VII, such economies, once well established, may be 
persistent. 


TABLE VI.—Series VIII. CHANGES IN PREFERENCE AS INDICATED BY RATINGS 
oF AMOUNT OF LIKING FOR SEPARATE STIMULI 
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TasBLe VIII.—Serises III 














| | : | 
| Frequency) Proportion AD of tail | Difference : 
seennntanianitibiniaininaien — — ——E a . . i: ———— 7 a _ 
een ae Sa ee Pe 
ie hie ia tinh a deeeaeel 7 | | 
OTe ere ee | 25 | 69 | 51 .63 
| | | 
| 
eee | 2 05 | 2.06 
AS ona es Real re 2 | 
RE giles est bwihsd nbd aie ane kan | 36 .95 11 1.95 
: | 
EEE OT EET 17 | 425 | .92 
eee | 2 | 
AUS NG wis ba Ok ahs NRA | 23 | 575 .68 24 
| | 
Orange juice... PT TOES ee TERT | 19 | 5 | .80 
RR Siikinivericee “ae 4 | 
ETE, ee | 19 | 6 | .80 00 
| 
a ienns eeuee ata ese KER Aae 16 39 | .98 
aa niinsiiienev ewan asal 2 | | 
rs Sais so Hneardacees | 2 | .61 | ~ .63 35 
| | 
EOE Fee ie eee | 27 64 .58 : 
No choice | 1 | 
EE Sac nine GAs wee ah ews ee | 15 .36 | 1.04 — .46 
RG iiccnixt Picnsiuadeekens | 24 56 |  .70 
 Srhint ence vciagnedt 2 | 
a bot baa 6 eae eKeane heeled | 19 | .44 .90 — 20 
| | 
Ne a au sits axeacee att aeeed 21 .49 81 
EE eee ee 1 
i ora wg St tee 22 51 .78 .03 
| 
Honey... .....cccccccceeceee. | g2 | 72 | .47 
Rr ietuues Kec db scawe tas 1 | : | 
PG thekpetinaecispercinie’d oeal 12 | .28 | 1.20 — .73 
NE ites aia rux down wehdnve | 10 | .28 1.32 
I ir bak ba Kids bee 0 | | 
Chocolate.............0..00. 00. | 34 | .97 | 39 93 
| 
Chocolate.............c0ceeeees | 32 | 86 | 86 
ish theca at dkvae ae eke | 0 
tek cid haiaiok rsd ei Beha a4 ad 5 | .14 | 1.59 —1.33 La 
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TABLE VIII.—Continued 








Frequency | Proportion| AD of tail | Difference 


a ee 19 51 | .78 | 
Se ree 1 | | 
IY 6c Ss gnawed Sinawe 18 .49 81 — .03 
EE iduiine. txts cabled 229 | | 
| 20 | | 
253 | | 
wat 
a nD 502 | 





Explanation.—In the first group, when water and orange juice were alternatives, 
water was chosen eleven times, orange juice twenty-five times, and there were 
seven times in which no choice occurred. 


TasBLe [X.—SeEnries III 











h—w 


(o — w) — (o — h) 
(c — w) — (c — h) 
(s —w) — (8s —h) 
(r —w) — (r —h) 


Average 


.73 direct measure 


.33 indirect measure 
.40 indirect measure 
1.49 indirect measure 
| 1.09 indirect measure 


81 








TABLE X.—SeEnrIEs III. 
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RESULTING DISTANCES 


TaBLeE XI.—Senrties III 





(J) Absolute scale based upon 
judgments 
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TaspLE XII.—Senries III. JUDGMENTS IN THE CASE OF REPEATED STIMULI 








| 











M;, My, | me, +. Be, TC,C, | n —iS° Vi-r 

w 22 | 1.14 68 | 64 | 28 | 1.12 
0 3.40 | 3.36 3.38 | #0 | 77 | 0.82 
s 5.08 | 5.36 5.22 60 | 101 | 0.78 
r 6.04 | 6.08 | 6.06 | .71 | 90 0.63 
h 4.56 | 4.60 4.58 | 65 | 78 0.63 
c | 5.02 | 5.00 5.01 | 54 | 106 | 0.64 

| Total. . . .480 Average 0.77 





There were four hundred and eighty judgments in Series III by H and four 
hundred and eighty by K a total of 1920 judgments, represented in each of_the 


first two columns. 


The correlation between column 1 and column 2 is .98 in a range extending 
from 0.68 to 5.01, or about 7 per cent of the estimated total range of affective 


response. 


TaBLeE XIII.—Senries VII. 


CORRELATIONS OF DIFFERENCES WITH CHANGE 


In Agreeable Stimuli 


C; minus S; with change in C, 
C; minus V with change in C, 
C,minus V with change in C, 


In Disagreeable Stimuli 


C, minus S; with change in S,; 


bee 6a 2 6 682 26.46 6 6 2 ee 6.9. 0:40 2% 08 


eA eee ee Pea Be eaeese ee eaoeas ede os ae 


C; minus B with change in Z.......................00... 


C, minus S2 with change in S, 


C, minus E with change in E............................ 
C,; minus V with change in V 


Stimuli: chocolate, salt, vinegar, egg white. 
Data from Gauger, loc. cit. 
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First period 


Second period 





One response daily 


——_—<—_—_ —_— 


cr [ele foal | 





1 6.0 
2} 7.0) 
3 7.0 
4; 6.0) 
5} 8.0] 
6) 9.5) 
7 9.0 
8 |—12.0) 
9|—10.0, 
10; —2.0 
11 —4.0 
12| —4.0 
13 |—10.0) 
14|—10.0) 


| 
WwSoSOOWWRrKH WwWwnd 





Same response 





mososoooousoooooe 


ip oe ee 
NUP PN NON OOOO 


2. 


moooooounsooosd 





29 
32 
24 
29 
22 
24 
21 
22 
26 
32 
30 
27 
19 
24 








} 
} 
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Second response added 























continued 
r | c \pa\ 1 F | c |DA| s 
| 

4.0 0.0|—4.0| 29 |—8.0| 0.0) 8.0) 29 |CCV 
7.0| 0.0,—7.0| 12 |—8.0) 1.0) 9.0) 12 |CCE 
6.0, 0.0'—6.0| 19 |-6.0| 0.0) 6.0] 19 |\CCE 
3.0} 2.0,—1.0) 29 |—7.0} 2.0) 5.0) 29 |CCV 
4.0; 1.0/-3.0) 30 |—7.0} 1.0) 8.0) 30 | HHS 
3.5! 1.0/-2.5) 26 |—6.0/—1.0) 5.0) 26 |HHV 
3.5, 3.0,—0.5| 22 |—2.5|-0.5] 2.0) 22 |HHE 
4.5/-3.0) 1.5 6| 9.0} 4.0\-5.0| 6 | ZEH 
5.0; 1.0, 6:0) 24| 5.0; 0.0/—5.0) 24 VVH 
1.0| 1.0| 2.0| 14| 6.0) 1.0;—5.0| 14 | EER 
1.0} 1.0) 2.0/ 27| 7.0) 1.0)/—6.0) 27 | SSC 
3.0, —2.0) 1.0 4 3.0; 2.0|-1.0| 4 | EES 
3.0) 1.0; 4.0| 9| 8.0} 3.0|\-5.0) 9 | ZER 
3.0) 1.0) 4.0) 29| 7.0) 1.0)/—6.0) 29 | VVC 














iF 
; 
i* 





S, stimuli 


F, final 
T, initial 


DA, number of days 
H, honey 


V, vinegar 


E, egg white 
C, chocolate 
S, salt solution 
R, raisins 


HHV, honey in first and second periods, and vinegar added in the second. 








The first line is read as follows: One child liked chocolate +6.0 at the beginning 
of the first period of twenty-nine days, and +2.0attheend. Then, continuing on 
successive days of the second period, liked it +4.0 at the beginning of the second 
period and 0.0 at the end; and liked vinegar —8.0 at the beginning of the second 
period and 0.0 at the end. 
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TaBLE XVI.—Series VIII. Ratines ror Eacu PERIOD DURING FINAL P.ua- 
TEAUS OF E1aut Days 























| | | 
Period | 1 | 2] 3 | 4 | 5 | 6 | 7] 8 
Positive eanuseentl 
I | 3.1| 2.6| 2.3) 2.4 | 2.6| 2.3) 2.7 | 2.0 
I | 3.4; 29) 3.0; 3.1) 2.7; 3.6) 2.9!) 3.4 
WI | 3.5 | 3.3 | 2.6; 3.0 | 14| 2.8 | 1.9| 1.4 
Negative Responses 
| | 
I 3.3 | -3.1| -2.7| -1.7| -2.1| -1.7| -1.7| -1.7 
II -3.3 | -3.1) -2.7) 2.1 | —1.7 —1.7 | -1.7| -1.7 
Wl = | —2.0; —2.8 | —2.1} -1.9| -1.6| -1.0] +0.7 | -0.9 
Averages 3.1} 2.9] 2.6 2.4 | 2.0 2.1 | 1.7 1.85 














The signs of the twenty-four entries under Negative Responses were reversed 
before averaging. 


AVERAGES OF RATINGS DURING THE Last SIXTEEN Days IN Serres VII, ror ALL 
CHILDREN FOR ALL TasTEs (SIXTEEN CHILDREN AND E1GHT Tastes) 





} } } | ; 
AEA MBA Bek BM Re $8 9/10 11 12/13/14) 15) 16 
2.4 /2.4/2.5/2.4 2.3/2.1/2.3/2.1/2.2/2.2/2.1/2.0 2.0/2.0/2.0/2.0 
' { ; | ; | } ' 





The signs of Negative Responses were reversed as in Series VIII, this table. 


eee Kong, £1 te 














wre ee 



































Liking and Disliking 195 
Taste XVII.—Series IX 
Initial | Initial | Final Final Initial | Final 
Initial | Final | rating | rating | rating | rating | preference | preference 
choice | choice) for for for for jasestimated |as estimated 
| salt vinegar salt vinegar | from ratings) from ratings 
: : : 
Cy | G | Sr Vi | Sp | Ve | Pr Pr 
} | | | 
1si|v ij —2.1 | —2.2 | -0.75| 0.25 | = | V 
28) = [18 | 17 | 01] 23] 6 6celhUhTCUCU 
3 S|V 0.3 | -1.7 | —0.125) 0.5 | S | V 
4 8S = 0.2 |} -0.2 | 1.3 | 0.05 | S | S 
5 S| = 0.4 |-06 | 0.0 | -0.25; S S 
6 S| s | -1.8 | -2.8 | -0.125) -1.55| § s 
7S8|vV 2.6 | 1.9 ia}; 22t « = 
8 S V —1.2 | —1.8 10 | 2.0 | S V 
10 S V —-0.9 | —2.3 0.2 | -3.7 | S S 
1 S| = 1.5 | -1.6 | 0.4 | -2.3 S S 
12 S V —1.0 | —3.6 — ~—«0..0 | 0.15 S V 
13 S V 15 | 0.5 —0.375 —1.05 S S 
148 V -0.5 | —0.9 —O.5 | 1.0 S V 
Average | 
S Vi —0.16 | —1.30 0.18; 0.18 S = 

















The signs of equality represent indifference. 
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TasBLE XVIII 

Series Stimuli | TIC SD Ric N 
I raisins — .86 4.06 — .82 7 
I | square — .30 5.09 — .21 7 
I columbo — .53 6.55 — .32 7 
I triangle — .74 6.26 — .52 7 
I All four — .57 2.00 — .20 28 
II orange juice — .89 1.83 — .97 4 
II vinegar — .94 2.12 | —.98 4 
VII | salt 1 — .90 1.48 | —.98 16 
VII _ | chocolate 1 — .96 1.80 | —.99 16 
VII_ | egg white — .96 2.15 | —.98 16 
VII _ | chocolate 2 — .86 1.40 | —.97 16 
VII salt 2 — .94 2.86 | —.96 16 
VII _ | chocolate 3 — .90 1.11 | —.99 16 
VII | vinegar — .91 3.91 | —.89 16 
VII _ | chocolate 4 — .90 1.06 | —.99 16 
VII =| All eight — .93 6.24 | —.82 144 
VIII | stimulus 1, first period — .99 7.91 | —.95 14 
VIII | stimulus 1, second period — .98 3.64 | —.98 14 
VIII | stimulus 2, second period — .98 6.63 | —.93 14 
VIII | All three — .98 6.33 | —.94 42 
IX chocolate — .40 1.88 — .63 16 
IX vinegar — .40 1.93 — .62 16 














ox RE etn oe rea 





r1¢c, correlations between amount of liking of stimuli at the beginning (/), and 
amount of change before the (C), z.e., between Columns J and C of Table XV, and 
the like. 
Ric, correlations inferred by formula Kelley 186 from r and given SD, as of an 
SD of 3.5, taken as typical of the column SD’s. 
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THE INTELLIGENCE AND ACHIEVEMENT OF 
PRIVATE SCHOOL PUPILS 


WALTER F. DEARBORN anno PSYCHE CATTELL 


Psycho-educational Clinic, Harvard University 


In the days when group intelligence tests were first being used in 
schools, it was not uncommon to have them criticized on the basis 
that the results were not in accord with the judgments or opinions of 
teachers. One private school teacher stated, for example, that he was 
sure that there must be something the matter with the tests because in 
a try-out, which he had made of them, it appeared that his classes were 
composed chiefly of near-geniuses, and he knew very well that was 
not the case; on the contrary, he was sure that his pupils were, on the 
whole, a rather ordinary lot. He had found that, according to the 
tests, his pupils had averaged nearly 125 in Intelligence Quotients. 
It was, however, not difficult to show that this finding was really in 
accord with other evidence in regard to their intellectual caliber, that, 
e.g., practically all of the graduates of this school took high place in 
the college entrance examinations and later on in their college studies. 

Even now the facts in regard to the intelligence of private school 
pupils are not generally recognized by their teachers, and the relation 
of intelligence to achievement is even less considered. It is, for 
example, not unusual to find that the norms of the standardized 
achievement tests (which represent the average accomplishment of 
public school children) are used as if they were applicable to private 
school children. 

Although this study is of private school children in the vicinity 
of Boston, it is probably not unrepresentative of the better private 
schools in the country at large. The majority of the pupils of the 
private schools are drawn from the homes of the well-to-do. The 
poor are largely excluded on account of the tuition fees and with 
them many of poor heredity. Hence the private school pupils not 
only have a home environment but also a hereditary background which 
is superior to the average. This study of their intelligence and school 
achievement is based on the chronological age, grade status, and intelli- 
gence test results of 1295 pupils from twelve schools and the achieve- 
ment test results from three schools, about 300 pupils. 

The intelligence test used with most of the Kindergarten and a 
few of the first grade children was the Stanford-Binet. The other 
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pupils were given the Dearborn Group Tests of Intelligence. Either A 
or both A and B of Series I were given in grades one to three inclusive, 
and either C or both C and D of Series II in Grades IV to XII inclusive. 
Unfortunately the achievement tests used varied with the school. 
They will be described in the section dealing with achievement. 


TaBLeE I.—TueE INTELLIGENCE QUOTIENTS OF PRIVATE ScHOOL PUPILS 
Medians, Quartile Points and Quartile Deviations by Schools 




















School | yo | Gees Por vee | wasted | Quartile 
| | (inclusive) | quartile | quartile | deviation 

A 164 I-VIlI | 120 133 | 107 13.3 
B 108 ? (upper) | 118 123, 108 7.3 
C 12 K | 127 | 138 | 115 | 11.2 
D 189 I-xI | 122 131 | 113 9.1 
E 110| | V-VIIT | 18 | 127 | 108 | 9.4 
F 78 K-VIII | 119 132 | 113 | 9.3 
G 28 ? (upper) | 128 134 | 116 | 8.9 
H | 200} VIII-xXIT | 109 | 118 | 99 | 9.4 
I | 143] VII-XIT |= 125 132 | 119 | 6.7 
J | 69|? (forms1&2)} 118 126 | 110 | 82 
K | 105 V-XII_ |_—s:120 128 | 112 7.8 
L | 90 I-VII | 114 125 | 105 | 10.3 
| 1295 K-XII | 119 | 128 | 109 9.7 

Public schools... .| 3623 II-XII | 103 | #114 | 91 11.5 











Table I gives the median intelligence quotients, the quartile 
points and the quartile deviations for each school. For purposes of 
comparison the same information is given for 3623 public school 
pupils at the bottom of the table. It will be noted that the semi- 
interquartile range is larger for the public than for the private school 
pupils. This may be accounted for by the fact that all grades of 
intelligence are found in the former while the lower ranges are largely 
excluded from the latter. The medians are plotted in Figure 1, and 
the frequency distribution on which the calculations are based in 
Figure 2. The light lines are the distributions for the individual 
schools, the heavy solid line the distribution for the 1295 pupils from 
all twelve schools and the heavy broken line is a frequency distribution 
of the 1Q’s of all the school children in three Massachusetts towns 
including 3623 cases. The median IQ for the several schools varies 
from 109 to 128. While the median for the whole group is 119. 
According to Professor Terman’s classification there is only one 
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school in which the median pupil falls below the lower limit of the 
superior group and that by only one point, while the median child of 
six of the twelve schools falls in the very superior group, and the median 
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Fic. 1.—Median intelligence quotients of pupils in private schools. 


for all the pupils comes within one point of falling in the very superior 
group. Table II gives the percentage of private school pupils falling 
in the several IQ groups and Figure 3 is a percentile graph giving the 


TaBLeE IJ.—PERcENTAGE DISTRIBUTION OF IQ’s 


IQ FREQUENCY IN Per CrEntTs 
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per cent of private and public school children having IQ’s above and 
below any given point. 

If pupils, as highly selected as these, did not accomplish work 
above that of the unselected pupils found in the public schools the 
conclusion that the private schools were not making the most of their 
opportunities would be inevitable. A more pertinent question is 
whether or not their achievement is as far beyond that of the public 
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Fic. 3.—Percentile graphs of the intelligence quotients of private and public school 
children. 


school child as their superior mental equipment would appear to 
indicate was within their capacity. 

The mental and chronological ages are plotted by grades in Figure 4. 
The broken lines show the medians for the private schools and the 
solid lines the grade norms for the unselected children. The chrono- 
logical-age grade norms are those given by Gates in ‘‘The Improve- 
ment of Reading’’ for the beginning of theschool year. The mental-age 
grade norms are plotted on the assumption that mental growth 
ceases on the average at fourteen and a half and that the average IQ 
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is 100. The curves given as the norm for the mental ages is not 
exactly that found in the public schools. Actually the median IQ 
is slightly below 100 in the lower grades owing to the holding back of 
the slow pupils and in the upper grades it is above 100, caused by the 
dropping out of the poorer pupils. 
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Fic. 4.—A comparison of the mental and chronological ages of private and public school 
pupils. ' 


Most of the intelligence tests were given near the beginning of 
the school year, but this was not true of all the schools. The median 
time of the examinations for all the pupils was in late November or 
after the lapse of about one-fourth of the school year. Therefore, 
in Figure 4, instead of plotting the median mental and chronological 
ages directly over the numeral representing the grade, they have been 
plotted one-fourth of the distance between that grade and the next. 
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The median private school pupil is accelerated in grade status 
from one month to one year according to his chronological age, but 
according to his mental age he is retarded from one-third to one-and 
two-thirds years. Whether the retardation is due to the fact that the 
child has not been in school as many years as has the average child 
of the same mental age but older chronological age and hence has been 
prevented from acquiring an equal amount of knowledge by mere 
lack of time, or whether the retardation has been caused by the failure 
of the schools to provide the opportunity for him to advance as rapidly 
as his ability would permit is not clear from these data, but the study 
of School D reported below indicates that if lack of years of life is a 
handicap it is one that can be overcome. 

With the purpose of determining whether or not the intelligence 
quotients of private school pupils varied from age to age, the median 
intelligence quotients were calculated for one-year age groups and are 
given in Table III, together with their probable errors, the quartile 
points and quartile deviations. The medians are plotted in heavy 
black lines in Figure 5. The regularity with which the median IQ 
rises from the age of five to eleven then steadily drops until fourteen 
when there is again a steady rise from fourteen to eighteen is striking. 


TasLeE II].—PrivaTe ScHoou Pupits 
Mental Ages and Intelligence Quotients by Ages 





| 
} 
| 


Mental ages Intelligence quotients 





N ‘ pe l | l 




















Age 
| Median| PR| UPeer | Lower | Quartile  steian pE| Lower | Upper | Quartile 
| | quartile | quartile deviation | | | quartile | quartile | deviation 
| | | 
Pe ae ee as ee ee I eee ioe 
5| 29) 6 4 |1.45 610 | 510) 0 6.1| 112 | 1.9 103 120 8.3 
6/ 70 7-7/1.0) 8 2) 71) O 6.7) 115 | 1.2) 108 124 7.8 
7; 59 8&9) 1.7) &7 | 7-11 | 0-10.2) 114 | 2.1 103 129 | 12.8 
8} 62) 10-1) 1.9 11-2) 9 2) O-11.9) 119 | 2.1 104 131 13.2 
9| 92 11-7{ 1.7) 128); 10-6] 1- 1.2) 122 | 1.8 110 137 13.6 
10 | 79 12-7 | 2.0, 14-0 | Me 7] 1- 2.3) 122 | 1.6 113 135 | 11.0 
11) 78 14-9/ 2.0) 16-2] 13-9] 1- 2.4 | 129 | 1.3, 120 138 9.0 
12 | 120) 15-6 | 2.0 16-9) 13-11| 1- 5.1; 123 [1.3 112 133' 10.9 
13 | 153) 16-1/ 1.3) 17-1 14-11} 1-0.9| 120 | 0.8 110 126 7.8 
14 | 137, 16- 5) 1.5, 17-5| 15 1{ 1-1.8| 114 | 0.8 104 120 8.0 
15 | 122) 17-0 | 1.8) 17-11) 15-4) 1- 4.0) 116 | 1.1 104 123 9.6 
16 | 108} 17-0/ 1.5; 17-10) 15-8| 1-0.6| 117 | 1.0 108 125 | 8.4 
17) 82, 17-4/ 1.3) 182) 16-8|-09.3! 119 | 0.9 113 126 | 6.3 
18 | 48) 17-10) 2.44 18 9)| 16-6)| 1- 1.3; 122 | 1.4 114 129 | 7.5 
19| 18 16- 6 | 4.1) 17-7) 15-3 | 1- 2.0; 116 | 2.2 107 122 | 7.6 
| 
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The drop at nineteen is probably caused by the graduation of the 
brighter pupils. There is, however, a complicating factor in that 
the pupils came from different schools and the schools do not all have 
the same age range. One school will take pupils from the Kindergarten 
through Grade II, another from Grade IV to Grade VIII and a third 
from Grade VI through Grade XII, etc. Therefore if a school were 
included in which there was an unusually strict selection of pupils 
and in which there was a narrow age range with a median at eleven 
years, it might account for the rise in I[Q’s from age to age with the 
peak at eleven. A check on this point appears to indicate that this 


may be partly, but probably not the only, cause of the rise and fall 
of the intelligence quotients. 


TaBLeE IV.—PrivaTE ScHoou PupPiLs 
Chronological and Mental Ages and Intelligence Quotients 





















































Chronological age Mental age Intelligence quotients 
Grade N ) i _ ‘eon 
} 
Median| Q: | Q; | Q | Median) Q; | Qi | Q Medio Q: | Q: | Q 
| | 
| | | | | | | 
K 27; 5-2) 4-9) 5- 7/0-5.4) 6-0} 6 8 5-8 0- 5.9) 119 | 128) 111) 8.3 
I 84, 6- 6 | 6-10} 6- 1/0-4.3) 7-4 | 7-11, 6- 8,0- 7.6; 113 | 123) 106) 8.1 
II 62, 7-8 | 8 0| 7- 4/0-4.0) 8-10 | 9-10) 7-10, 0-11.5| 116 | 130) 103|13.1 
III 57} 8 9| 9-0) 8- 4/0-4.2) 9 9 !11- 0) 9- 2;0-11.1) 113 | 125 104/10. 4 
IV 75| 9-7 |10- 0) 9- 1|0-5.1) 11- 7 |12- 6/10- 6 1- 0.3) 120 | 134) 110)11.9 
V 81) 10- 6 | 10-11) 9-10) 0-6.4) 12-11 |13- 9 11- 6) 1- 1.3) 121 | 134) 109)12.4 
VI 76; 11-7 |12- 2) 11- 0) 0-7.0) 14- 1 | 15- 4/13- 2) 1- 1. 2 124 | 134) 112/11.2 
VII 120} 12- 5 | 13- 1/12- 0/0-6.5| 15- 6 |16- 5 14- 4) 1- 0.6) 124 | 134) 113/10.8 
VIII 126| 13- 7 “14 1| 12-11] 0-7.0| 16- 7 |17- 4) 15- 7 0-10.6| 123 | 125) 116) 4.6 
IX 41) 14- 3 | 14-11) 13- 8| 0-7.4| 17- 0 17- 7\16- 3}0- 7.8 119 | 127) 115) 5.6 
x 41) 15- 6 | 16- 1)15- 1) 0-5.9| 17-10 | 18- 7, 16- 9) 0-11. 1| 123 | 130) 117) 6.5 
xI 41, 16- 4 | 16-11) 15- 8) 0-7.5) 18- 1 |18- 6)17- 3)0- 7.5) 124 | 128) 119) 4.4 
XII 30) 17- 8 | 18- 3) 16- 8, 0-9.8) 18- 0 | 19- 0} 17- 20-1 0. 9 125 | 133) 119] 7.2 
RE ale Wn was a, | 

















The medians were recalculated using only those schools whose 
median child had an intelligence quotient within two points of 119 
which is the median of the total 1295 cases from all the schools, or in 
other words the pupils in all the schools whose median child had an 
intelligence quotient of 117, 118, 119, 120 or 121. The results are 
plotted in light lines in Figure 5. There is still some indication that 
the pupils between the ages eight and thirteen are more highly selected 
than are those between five and seven and between fourteen and six- 
teen inclusive. Unfortunately this procedure reduced the number 
of cases to a little less than half, the number in the different age groups 
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Fic. 5.— Median intelligence quotients of private school pupils by ages. 
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Fic. 6.—Median intelligence quotients of private school pupils by grades. 
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varied from twenty to seventy-eight. The small number of cases 
may in part account for the less regular rise and fall of the medians 
from age to age. 

A comparison of the median IQ’s of the different grades is shown 
in Figure 6. Again the heavy black lines indicate the medians for 
all the pupils and the light lines the medians for the pupils who come 
from schools whose median pupil’s IQ lies between 117 and 121 inclu- 
sive. Here no distinct tendency for the intelligence quotients to 
vary with age is found except in the first and second grades where the 
selection appears to be less rigorous than in the later grades. 


ScHOOL ACHIEVEMENT 


It has been seen that the private school pupils as a group consist 
of children of superior intelligence. The lowest median IQ found in 
any of the twelve schools was 109, the highest 128 and the median 
of the whole group of 1295 children was 119. Schools entrusted with 
the education of this highly endowed group have a grave responsibility 
as well as a wonderful opportunity. | 

Unfortunately achievement tests had been given in only three 
of the schools and the tests given in these were not the same for each 
school. The most satisfactory treatment of the test results was 
possible in School D. The education tests used were the Stanford 
Achievement which covers a number of subjects and for which there 
were reliable age norms. Data were obtained for three grades only. 
The pupils in Grade III were given the Dearborn Intelligence Exam- 
ination A and the primary form of the Stanford Achievement Test; 
in Grades IV and V Dearborn C and the advanced form of the Stanford 
Achievement Test. 

The test results are summarized in Table V. The pupils of this 
school compare favorably with other pupils not only of similar chrono- 
logical ages, but also of similar mental capacity. As in the other pri- 
vate schools the pupils are a select group, the median 1Q of those in 
the three grades under consideration is 116. It has usually been found 
that pupils with low IQ’s have educational ages above their mental 
ages while those of high IQ have educational ages below their mental 
ages, thus resulting in those with a high IQ having a low AQ(EA + 
MA) and those with a low IQ having a high AQ. It is even claimed 
by some authorities that this must of necessity be the case. But we 
have here a group of superior children with a median educational 
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quotient of 122 which is six points higher than their median IQ. The 
median AQ is just 100! indicating that as many of these pupils are 


TABLE V.—PrIvATE ScHooLt D 






































Mental ages Grade III | Grade IV | Grade V | Total 
! 
M 9-4 10-8 12-6 | 
Qs 10-0 11-4 13-0 
Qi 8-10 10-0 11-8 
Q 7.0 8.2 7.6 
Chronological ages | 
M 8-0 9-3 10-0 
Qs 8-8 98 | 10-7 | 
Q: 7-9 $10 98 
Q 5.5 5.0 | 5.2 | 
Intelligence quotients | | 
M 114 114 | 122 116 
Qs 122 124 | 131 126 
Q: 107 106 | «113 108 
Q | FH +. Oe t C2 8.6 
Stanford achievement grades | | 
M Se | SF 1 es 
Q: sa] 68 | te 
Q: 34 | 43 | 61 | 
Q :: oe 05 | O58 | 
Stanford educational quotients | 
M 122 120 | «122 | 121 
Q; 128 a i ae: | 
Qi 160 | 110 | 116 | 116 
Q $3 | 78: | 6.3 6.1 
Stanford accomplishment quotients | | | 
M 103 | 101 | 98 | 100 
Qs | 109 | 108 | 104 108 
Qu | 98 9 | 9% | 96 
Q | 60 | 62 | 42 | 6.8 


} 
| 





doing work above, as below, the average child of the same mental 
age, but who are older chronologically. The teachers appear to be 
accomplishing results which are worthy of their superior pupils. It is 
interesting to note that while the accomplishment quotients are higher 





1The AQ is not over 100 even though the median educational age is higher 
than the median mental age on account of the skewness of the curves and because 
there were not the same number of pupils included in each median. Some had 
taken the intelligence test but not the educational test, or vice versa, in these cases 
the accomplishment quotient obviously could not be calculated. 








5 on Fad 








208 The Journal of Educational Psychology 


than is usually found among pupils with high IQ’s, there nevertheless, 
exists a negative coefficient of correlation (p) of 0.52 between IQ 
and AQ. 

Figure 7 is a frequency distribution of the IQ’s of all the pupils 
in the three grades, the broken line the distribution of the educational 
or accomplishment quotients. 

The Stanford Tests were given in December, hence the norms for 
the respective grades at the time were 3.3, 4.3 and 5.38. According to 
the results of the achievement test the pupils had accomplished 3.9, 4.7 
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Fig. 7.—Distributions of intelligence and accomplishment quotients. 





and 6.5 grade work respectively. The semi-inter-quartile range was 
fivemonthsineachcase. In Grade III the average pupil was six months 
ahead of the Grade III public school child, the median Grade IV child 
was four months and the median Grade V child was one year and two 
months ahead of the grade norm. The median AQ of Grade III was 
103, of Grade IV 101 and of Grade V 98. The explanation for the 
lower AQ in Grade V is probably to be found in the fact that the pupils 
are placed in a grade the norm of which is 1.2 years below their median 
mental age and they have, therefore, probably not been presented 
in school with that material, a knowledge of which it would be neces- 
sary for them to acquire before they could obtain an AQ over 100. 

The median educational and accomplishment quotients of some 
of the separate tests of the battery are shown in Table VI. The 
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median educational quotient in arithmetic is 116, in reading 121 and 
in language usage 140, the median AQ’s 97, 103 and 116, respectively. 
The extraordinary high quotients in language usage has probably 
resulted from hearing correct English spoken at home. In this test 
the pupils are given a number of sentences in two forms and are asked 
to underline the correct one. A number of Grade IV and Grade V 
pupils who earned a “‘language usage age” of 14 or 16 were making 
less than average marks in English grammar. They had learned to 
distinguish what “sounds right” from what ‘‘sounds wrong,” but in 
their school examinations were unable to tell the rules. 


TaBLeE VI.—Privatse ScHoo.u Poupits 
Median Stanford Achievement Test AQ’s and EQ’s by Grades 






































EQ | AQ 
Arith-| , . Arith-| , . 
Grade |N'| metic Arith- Total |Total| Lan- | metic Anith- Total |Total| Lan- 
funda- mene arith- | read- e | funda- mace ith- | read- e 
onde h- |r guage |funda-| | arith- | read-| guag 
men- . | metic} ing | usage | men- . | metic! ing | usage 
tals | S°mine tals |S0ing 
Ill {18 118 115 118 | 122 aca 100 98 | 99 | 107 
IV /|25) 113 110 112 | 112 132 98 98 | 98 96 117 
Vj22; 116); 115 116 | 130 143 O4 94 93 | 108 116 
Total 
M 116 | 114] 116] 121 | 140 97 97 97 | 104} 116 
Q; 120 123 120 | 132 153 105 103 105 | 112 125 
Q: 110 | 107 109 | 112 126 91 91 90 95 105 
Q 4.8 8.0 5.2 |10.2 | 13.2 7.2 5.9 | 7.3) 8.2] 10.3 























The median educational, and accomplishment quotients of Schools 
A and F are given in Tables VII and VIII. The median 1Q of the 
pupils in School A is 120 and in School Fis 119. Although the ability 
of the pupils is thus seen to be above that of school D, the median 
educational quotients are lower. As measured by the Peet-Dearborn 
Tests the educational quotients in arithmetic are only slightly above 
the public school child. In one school it was four points higher in 
the problems and three in the fundamental processes; in the other 
school the differences were one and two points respectively. In 
arithmetic, at least, they had profited but little from their superior 
mental ability. The median AQ was approximately 87 in both schools. 
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The educational quotients are somewhat higher in reading. The 
Haggerty primary reading test was given in Grades II and III of 
both schools. The median EQ’s were 112 and 118. This brings the 
AQ up to 95 in the first school, but since Grades II and III of the second 
school have especially high median IQ’s the median AQ is only 89. 
The Thorndike-McCall reading test was given to Grades III to VIII 
of School C. The median EQ was 122 and the median AQ was 98. 


TasBLeE VII.—PrivaTe ScHoou A 
Median EQ’s and AQ’s by Grades 



































| | EQ AQ 
P-D | P-D | in 'Ppp| PD. ‘ 
Grade | N | funda-| arith- | “8 | T-M | arith- | arith- | "| T-M 
mental} metic | gerty | vead- | metic | metic | gerty | read- 
' | read- ; | | read- : 
| arith- reason-| ine ing | funda- | reason-| ine ing 
| | metic ing | | | mental ing | | 
| 
I |23| 92 “Seok ee 9 | 89 | 
II | 21] 102 96 | 112 oe 92 | 86 96 
III 9! 106 101 115 118 89 = 88 85 90 
IV | 23] 103 101 ey 121 86 87 5 104 
V | 22! 106 | 110 - 132 77 83 is 99 
— te... _ 7 126 Re re a 102 
VII {| 19] ... aa Fe 126 uf a 2 96 
VIII | 21] ... BS oar 116 ev _ a 97 
Total 
M |..| 102 101 112 122 87 | 87 95 98 
Q; | ..| 109 115 | 125 | 134 95 98 | 110 107 
Qi . 92 89 98 | 112 78 77 | 84 91 
Q | 8.2 [138.1 | 13.3 | 10.5 | 8.8 | 10.8 | 13.3 | 8.3 


























In all three schools the pupils stand relatively higher in reading 
than in arithmetic. The achievement of the pupils of school C in 
reading is about equal to (in school D above, and in school F below) 
the average child of the same mental age but of average 1Q. In 
arithmetic the private schools compare less favorably with the public 
schools. None of the three schools has a median AQ in arithmetic 
that comes up to the norm; in school D it was 97, while in schools 
C and F it was 87. It is possible that the greater accomplishment 
in reading than in arithmetic is due to the teaching in this subject 
being superior to that offered in arithmetic, but it appears more likely 
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that the superior out-of-school environment of the private school 
pupil has more influence on achievement in reading than in arithmetic. / 
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TaBLeE VIII.—PrivatTe ScuHoout F 






































Median EQ’s and AQ’s by Grades ; 

a 

| EQ AQ i 
| eee te ae ne i 
Grade | N| P-D | : P-D : aa 
— arithmetic J» ad . | Haggerty | arithmetic - » . | Haggerty 

arithmetic; : arithmetic ' | 
. funda- ideale reading funda- sane reading S| 
mental 6 mental — 
7 I | | | 
Il 8 102 110 110 | 75 90 85 1 
III 9 99 | 101 119 | 78 82 91 wt 
. 110 | 110 93 90 
Vv 9/ 107 | 100 ey | 88 82 i 
Total | 34/ 103 | 104 | 118 | 88 | 86 89 
| } } : 
In conclusion it may be pointed out that it is not only possible to 1 

bring the accomplishment of superior children up to that of the average | 
child of the same mental age but lower 1Q, but that with appropriate f 
methods of instruction the AQ may be brought up to well over 100. i 


This has been demonstrated by school D whose pupils in the three 
grades studied had a median IQ of 116, a median AQ on the Stanford 
Achievement Test battery of just 100 and on certain of the tests as 
noted above it was still higher, namely, 104 in reading and 116 in 
language usage. 
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VOCABULARY TESTS* 


NOEL B. CUFF 


Eastern Kentucky State Teachers College 


I. THe PROBLEM OF MEASUREMENT 


Two commonly accepted theses are: ‘‘ Whatever exists at all exists 
in some amount” and “ Anything that exists in amount can be meas- 
ured.” For a long time, however, efforts have been made to find out 
how many words children of a given age or school grade know. The 
results of studies by distinguished scholars have failed to agree. Many 
of the early estimates were only wild guesses. For example, Orsey, 
an English clergyman, stated that some of his parishioners used scarcely 
more than three hundred words and Hale made the statement that 
English workmen get along with one hundred words. George P. 
March reached the conclusion, which is also now known to be grossly 
erroneous, that ‘‘ Few writers or speakers use as many as ten thousand 
words, ordinary persons of fair intelligence not above three thousand 
or four thousand.”’ 

A number of attempts have been made to obtain complete vocabu- 
laries of young children by some such procedure as the following: An 
observer plans to be with a child constantly during his waking hours 
for a period of at least two weeks and to note down every word used 
by the child. During this time the child is stimulated by a variety 
of situations which it is assumed evoke the full use of speech. | It is, 
of course, true that the vocabulary understood by the child is larger 
than his spoken vocabulary, and that averages are of little value where 
numbers are small. However, the following table, which is somewhat 
misleading and inaccurate, is presented as a summary of a number of 
such attempts to secure detailed vocabularies of little children. 

These figures are no doubt above the average, for the children 
studied were born of competent professional people and had the advan- 
tage of a favorable environment. But three other investigations on 
large numbers of children show a total vocabulary of about five thou- 
sand words. These were made by Ernest Horn, by Mrs. Ernest Horn, 
and by P. C. Packer. The words which have a frequency of twenty- 
five or more in two of the lists and of fifteen or more in three lists 





* Read at the Twenty-fourth Annual Meeting of the Southern Society of 
Philosophy and Psychology, March 30, 1929, Lexington, Ky. 
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number eleven hundred twenty. It is stated that the average Grade I 
child can be expected to know all of these words. 


TaBLE [.—AVERAGE VOCABULARIES AT DIFFERENT AGES ACCORDING TO VARIOUS 








OBSERVERS 
Age Number of cases Average number of words 
1 10 | 8.9 
2 20 | 528 
3 11 1338 
4 7 | 1843 
5 2 4225 
6 2 3103 








For diagnostic purposes it would be very helpful to have an instru- 
ment with which to measure the extent of a child’s vocabulary, but 
several difficulties are encountered in the construction of such a test. 
It is not satisfactory to use tests which cannot be objectively scored 
or which require a large amount of time on the part of the pupil and 
the scorer. Some ingenious types of performances used by the makers 
of vocabulary tests will be described and an effort will be made to 
evaluate certain principles of method. 


II. NATURE OF THE TESTS 


Kirkpatrick, Doran, Bonser, Gerlach, Brandenburg, Terman, 
Neher, Holley, and the writer have estimated the number of words 


known by using tests chosen from some dictionary and then multiply- 
n in dictionary 


n in the sample 

Thorndike has selected words for tests from a list prepared to 
show the frequency with which the words are used. 

Kirkpatrick (1907) selected a list of one hundred words from 
Webster’s Academic Dictionary which contains twenty-eight thousand 
words. The subjects were given a printed test and were told to mark 
the words which they knew with a plus sign, those that they did not 
know with a minus sign, and doubtful ones with a question mark. 
They were also told to “‘count as known all words that you would 
not, as to their meaning, need to look up in the dictionary if you saw 
them in a sentence.” : 

Doran (1907) used lists varying from one thousand to several 
thousand words. He selected pages at random or more often in a 
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certain order, as every twenty-fifth or fiftieth page. Webster’s 
High School Dictionary was used first and then Webster’s International. 
Written or oral definitions were required, and estimations were arrived 
at by multiplying in the same fashion as Kirkpatrick, with an allow- 
ance for unusual words. 

Bonser, in 1915, tested the Speyer School with Kirkpatrick’s 
list. Later he re-tested the pupils at the Speyer School, and tested 
pupils at Edgewater, Horace Mann, and elsewhere with a list of one 
hundred fifty words selected from Webster’s Elementary School 
Dictionary, which contains forty-four thousand words. Kirkpatrick’s 
method was used also in these tests. 

Terman and Childs (1916) prepared a list of one hundred words. 
They took the “‘last word of every sixth column” in Laird and Lee’s 
Vest-Pocket Webster’s Dictionary, 1904 edition, containing eighteen 
thousand words. Subjects were tested individually by oral definition 
and a subject’s vocabulary was found by multiplying his score in the 
test by one hundred eighty. The estimates are stated for mental ages. 

Gerlach (1917) used one thousand words selected from Funk and 
Wagnalls’ New Standard Dictionary. He estimated that the list was 
representative of two hundred fifty thousand words. No biographical 
or geographical terms were included. The six hundred least difficult 
words were supplied with four definitions, only one of which was 
correct. The subjects were asked to define the four hundred most 
difficult words. He deducted from the number of the six hundred 
which were correctly marked, one-third of the number marked incor- 
rectly, and added the number correctly defined of the four hundred. 
Then he multiplied the sum by two hundred fifty. 

Brandenburg (1918) selected one word for every one hundred 
forty words in Webster’s Academic Dictionary. The subject was given 
the list of two hundred words and the following instructions: ‘‘In the 
space after each word that you know, write a sentence using the word 
correctly. Place a cross before each word that you do not know.” 

Neher (1918) used the same dictionary and the same methods that 
Terman and Childs used in preparing the vocabulary test for the 
Stanford Revision of the Binet-Simon Tests. 

Holley (1919) used the list prepared by Terman and Childs. The 
Holley Sentence Vocabulary Scale consists of tasks like the following. 
The subject is asked to draw a line under the right word of four, as in: 
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An orange ig @.......... dress 
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The Thorndike Test of Word Knowledge is somewhat similar to 
the test used by Holley. Five possible words are given and the pupil 
in all cases is to underline the word ‘‘ which means the same or nearly 
the same.”’ 

The writer selected one hundred words from Webster’s Shorter 
School Dictionary, which contains thirty-five thousand words. The 
editors state that they ‘‘have kept constantly in mind the needs of 
the schoolboy and schoolgirl, and the Shorter School Dictionary is 
designed expressly to meet them. While primarily and fundamentally 
a school dictionary, it is sufficiently broad in its vocabulary to_be of 
value in the general field.”” Since fifty-two of the one hundred words 
chosen by taking the last word of the first column on pages which were 
multiples of five, but not of twenty, are in Thorndike’s list of ten 
thousand words, it is obvious that the vocabulary of the dictionary 
was chosen carefully. One would expect over thirty per cent of the 
words selected from such a dictionary to be in the Thorndike list 
since his count was based on word-forms, not on word-meanings. 
For example, the word red is in the first five hundred of his words and 
the word well also comes in his first five hundred words. The former 
word-form has only one meaning and, of course, every count repre- 
sents that meaning. The latter word-form has, on the other hand, 
several meanings as noun, verb, adjective, and adverb. If the word 
form were applied to only one of these its rank in the frequency list 
would be much lower. Courtis has told us that teaching a child 3 
plus 4 does not insure that we have taught 4 plus 3. This being true 
there seems to be an insidious danger in the condensation of words. 
The fact that a child can define the word corn as a noun does not prove 
that he can define it as a verb, and the fact that a child can give some 
definition of a word or can recognize a simple definition of a word 
does not show that he has a thorough acquaintance with the word. 

As a result of the above conclusions, we (1) selected words from 
the source specified which contains in addition to ordinary words the 
most common words in technical and scientific terminology. This 
was seemingly justified by various vocabulary studies, including 
analyses of articles in American_newspapers, which show that boys 
and girls need words that are not in The Teacher’s Word Book to 
read intelligently. It has been asserted that the use of such words 
facilitates ‘“‘clear, easy, and forceful_expression.”” (2) We arranged 
the test so that the relative number of the parts of speech in the test 
are approximately the same as the parts of speech in the dictionary; 
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expressed in per cents, nouns 61, verb 22, adjectives 12, adverb 4, 
pronouns, conjunctions, etc., 1. (3) We listed as one of five words 
or phrases the first definitions given in the dictionary fora word. Ifa 
test contains only familiar definitions of words a person may be given 
credit for having a large vocabulary who has a famine of words. 
The hardware merchant has a gage which will tell instantly, to a 
sixty-fourth of an inch, the caliber of wire. We evidently need greater 
precision in vocabulary tests. The subjects were told to “select 
that one of the words or expressions which most nearly defines the word 
to be defined. Draw a line under the one of the five which is selected 
and write the number of it in the parentheses at the end of the line.”’ 

The test was given to eleven hundred ten subjects. Of this 
number one hundred eighty-three were pupils in the Training 
School of Eastern Kentucky State Teachers College, two hundred 
twenty-seven were students entering the College, five hundred fifty- 
three were in the Richmond City School, and one hundred forty-seven 
were in the Richmond School for Negroes. The results are not stated 
separately for the schools since most of the differences were not signifi- 
cant. The exceptions to this statement are the scores made by the 
Grade III and IV children. The negroes in these grades had averages 
of 7.3 and 12.5 words respectively while the white children had aver- 
ages of 26.5 and 31.9 words. 

The total vocabularly was found by multiplying the number of 
words selected correctly by 350. Some results are shown in Table II 
with the results from other tests which have been discussed. 


III. FuNcTION oF VOCABULARY TESTS 


Inglis states that his tests render a multiple service in the class- 
room ‘‘(1) as a measure of the student’s ability to use language; (2) as a 
guide in classifying students; (3) as an admirable teaching device to 
build vocabulary; (4) as a measure for teacher and ‘student of the 
individual student’s progress.”’ It is well known that reading depends 
to some extent upon the grasp of content of words. The Twenty- 
fourth Yearbook of the National Society for the Study of Education 
contains this statement: ‘‘Growth in reading power means, therefore, 
continuous enriching and enlarging of the reading vocabulary, and 
increasing clarity of discrimination in appreciation of word values.” 
Terman recommends that his vocabulary test be used as a brief intel- 
ligence scale. He says: ‘‘ Where a hasty preliminary sifting of pupils 
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is necessary it is recommended that the vocabulary test be used by 
itself ... ” The statement is also made that ‘‘the Stanford vocab- 
ulary test gives a mental age correct within one year in about sixty 
per cent of cases, and within a year and a half in eighty per cent of 
cases.”’ He holds that a high correlation, .91, of the vocabulary score 
with Stanford-Binet mental ages, for 631 school children, shows that 
vocabulary depends upon general intelligence rather than upon home 
environment and formal training. He says too that ‘‘The vocabulary 
test has a far higher value than any other single test of the scale . . . 
it probably has a higher value than any three other tests in the scale.”’ 

The various tests differ as to the degrees of acquaintance with the 
words required so that a precise statement of the function of the tests 
is not possible. 


IV. LimItTaTIONs OF VOCABULARY TESTS 


The more recent vocabulary tests are highly objective. The tests 
which have been discussed are, however, not totally subjective or 
totally objective. Some of the tests are more objective than others. 
In such tests as Halley’s the personal equation has been almost elim- 
inated. The method of scoring does tot leave room for the exercise 
of much judgment. 

The reliability of vocabulary tests has not been given much atten- 
tion. McCall reports a coefficient_of reliability of .53 based upon 88 
pupils in Grade VI for Thorndike’s Visual Vocabulary Scale A. 
Wyman and Wendell report a reliability coefficient of .79 +.04 for 
Thorndike’s Visual Vocabulary Scale B. Terman says of his test that 
tests of the same seventy-five individuals with five different vocabu- 
lary tests of the same type showed the average difference between the 
two tests of the same person was less than five per cent. ‘‘This,’’ he 
says, ‘‘means that any one of the five tests used is reliable enough for all 
practical purposes.’’ The coefficient of reliability for the tests which 
we arranged is .87, based on one application of the test to ninety-nine 
Grade VII children. The self-correlation of the half test was called 
Th, 2 was substituted for N in Brown’s formula, and the reliability ‘of 
the whole test was calculated. Examination of the scores justified 
the assumption that the halves of the test were approximately equiva- 
lent in difficulty. Since the chances are ninety-nine in one hundred 
that the true average lies within the limits of 4PEq», it is obvious from 
the data that our averages are reliable. 
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There is not much evidence available for even hypotheses concern- 
ing the validity of the various tests as instruments for measuring one’s 
grasp of the words of the English language. There are two questions 
which are very complex: ‘‘What is meant by acquaintance with 
words?”’ and ‘‘How many words are there in the English language?”’ 
Dr. F. H. Vizetelly, the lexicographer, says: ‘‘No one knows how 
many words there are in American speech, or in the English language, 
and no one ever will know, for no one can compute them.”’ We also do 
not know the effect of presenting words in isolation, rather than in con- 
text. It is probable that subjects know fewer words when they are 
presented in isolation than they would if the words were presented in 
context. 

There are many discrepancies in the results of different individuals 
shown in Table II. These are partly due to the dictionary used. It is 
not possible to use all of the words in the language to secure a com- 
prehensive measure, so the test has to be made comprehensive by 
including random samplings. Terman selected words from a diction- 
ary of eighteen thousand words while others have tried to sample 
two hundred seventy-five thousand or more words. Bonser reports 
a vocabulary of 18,704 and of 31,120 for Grade VIII, according to the 
dictionary used. 

The discrepancies are also partly due to the method of the test. 
Most of the tests are planned to find out how many or how difficult 
words one can define, rather than how accurately or how rapidly one 
can define them. Subjects may think they know a word and so check 
it when the Kirkpatrick method is used, who could not define it well 
enough to receive credit on Terman’s test. Where easy selection 
tests are given the vocabulary tends to be high. If the words are 
selected from a limited list, the estimation may be too low because sub- 
jects are likely to know words not in the list. 

In conclusion, if it were possible accurately to evaluate the effects 
of variations in method it is probable that the results would be in 
practical agreement. The variations in method as a rule do not invali- 
date the measures for comparative purposes. : 
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NATURE OF MIRROR-DRAWING ABILITY: NORMS ON 
MIRROR-DRAWING FOR WHITE CHILDREN BY 
AGE AND SEX 


R. J. CLINTON 


Oregon State College 


It is well-nigh fundamental that the more delicate motor per- 
formances of the hand be carried out under direct visual guidance. 
To work in the dark or with the hand out of view imposes difficulties 
which are readily appreciated. An intermediate situation between 
seeing the hand as it works and not seeing it at all is to view it in the 
mirror. Apparently the first person to make full experimental use 
of this ancient situation of watching the hand in a mirror was V. Henri! 
in 1898. He devised the mirror-drawing technique, fully discussed the 
problems and difficulties involved, and showed that the results indicated 
trial and error learning. Dearborn? began working with it in 1905 and 
published his results in 1910. Carmichael* has anticipated me in bring- 
ing together notes on the history of the method. Asa rule it has been 
used as a means of working on the problem of trial and error learning. 

Pyle‘ in 1923 made a study to get at the capacity to do the mirror- 
drawing. The pattern was one similar to Dearborn’s star pattern, 
and was designed by Pyle for use in his laboratory work. The 
numbers on the pattern ran from 1 to 24 inclusive, and were arranged 
in two circles on the pattern. Numbers 2, 5 8, 11, 14, 17, 20, and 23 
make up the small inner circle, and numbers 1, 3, 4, 6, 7, 9, 10, 12, 13, 
15, 16, 18, 19, 21, 22, and 24 make up the large outer circle. A small 
dot located at each numbered point of these circles was to be cut by a 
continuous line from 1 to 24. Each subject’s score was the number 
of lines completed during the interval of time. Pyle extended the 
work from the eighth grade down to the fourth. 

The present work extends the experiment down to and including 
the first grade. We designed a mirror-drawing pattern sheet to use 





1 Henri, V.: Revue générale sur le sens musculaire. L’année Psychologique, 
1898, pp. 504ff. 

2 Dearborn, W. F.: Experiments in learning. Journal Educational Psychology, 
1910, pp. 374-378. 

* Carmichael, Leonard: History of Mirror-drawing As a Laboratory Method. 
Pedagogical Seminar and Journal General Psychology, Vol. XXXIV, pp. 90-91. 

4Pyle, W. H.: “Nature and Development of Learning Capacity.’’ 1923, pp. 
48-53. 
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in Grades I, II, and III, in which the figures were from 1 to 15. The 
numbers were placed in such positions that the children had no diffi- 
culty locating them. Each pattern sheet was evaluated on comparable 
groups, and the writer was able to work out continuous norms for the 
ages from six to seventeen inclusive, by use of the two mirror-drawing 
patterns. 

In this article we shall deal with the following questions: 

1. The relation of mirror-drawing ability to general intelligence. 

2. The relation of mirror-drawing ability to motor speed as shown by speed 
in making short marks and by writing. 

3. What is the sex difference in ability to do the mirror-drawing? 

4. What is the sex difference in simple motor speed? 


5. We shall set forth the age and grade norms for mirror-drawing, marking 
speed, and speed in writing. 


METHOD OF THE STUDY 


The data were obtained from 1903 unselected students in four 
school systems. Each student was allowed to work for a period of 
five minutes on the mirror-drawing pattern sheet and the number of 
lines completed correctly in that time was his score. If a student was 
able to complete more than one pattern in the five minute period, he 
was provided with an additional sheet. In order to get the marking 
speed, each student made short marks for one minute. That method 
was very similar to the much-used one of dotting. The letter speed 
was obtained by having the students write their names over and 
over again as many times as they could in two minutes. The total 
number of letters made was the score. The apparatus for mirror- 
drawing was that commonly used in the laboratories. It consisted 
of a stationary mirror on a drawing board, and a small board shield to 
prevent the subject from looking directly at the hand while drawing. 

The experiments were given to both the white and negro children 
of four school systems, but we shall deal in this article only with the 
results of the white children. 


THE RELATION OF MIRROR-DRAWING CAPACITY TO GENERAL 
ABILITY 


The first question that we shall deal with is the relation of mirror- 
drawing capacity to general ability. Burt and Moore! reported a 





1 Burt, C. and Moore, R. C.: Mental Differences between the Sexes. Journal 
Experimental Psychology, 1912, p. 355. 
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correlation of .60 between mirror-drawing and general intelligence 
scores. ‘The data were obtained in grade schoolsin England. Schott? 
correlated mirror-drawing scores with general intelligence ratings of a 
group of fifteen year old boys and girls and got .20 PE. .08. 

Calfee? found the following correlations: Thirty elementary school 
boys, .07; fifty-two freshmen university girls, .19; fifty-one freshmen 
boys, .07. 

In our study we gave intelligence tests to a group of elementary 
school students, to two groups of high school students, and to two 
groups of university students in educational psychology. In order 
to make a comparison we selected the thirteen highest mentally rated 
boys and the thirteen lowest mentally rated boys in the elementary 
school group; the thirteen highest mentally rated girls and the thirteen 
lowest mentally rated girls in the same group; the fifteen highest 
rated and fifteen lowest rated boys in a high school group; fifteen 
highest rated and fifteen lowest rated girls in the high school group; 
and we selected the twenty-five highest and twenty-five lowest rated 
boys and the twenty-five highest and twenty-five lowest rated girls 
in another high school group. Table I shows the comparisons. 


TaBLE I.—MIRROR-DRAWING ABILITY 

















Highest group Lowest group 
School Nun- : : 

ber ~_ Average | Mirror- | Num- Sex Average | Mirror- 

. IQ drawing | ber IQ drawing 
CS PE Oe ere 13 boys | 120 8.5 13 boys 87 | 16.7 
Elementary... 13 | girls 127 5.7 13 | girls 91 9.9 
High school , 15 | boys 107 22.7 15 | boys 86 23.1 
High school.... ; ...-.-| 15 | girls 106 22.5 15 | girls 81 27.9 
High school.... 25 boys 112 28.5 25 boys 91 27.2 
OS EE ree 25 girls 111 40.0 25 girls 85 26.2 























A study of the table indicates that there is little, if any relation 
between mirror-drawing ability and general intelligence. In most 
instances the selected group with the lowest mental rating scored 
highest on the mirror drawing. The exceptions are found in the second 
high school group where the brighter boys scored 1.3 points higher on 
the average than the slower group, and where the brighter high school 





1 Schott, E. L.: ‘The Development of Learning Capacity.”” Thesis, University 
of Missouri, 1923, p. 85. ; 

2 Calfee, M.: College Freshmen and Four General Intelligence Tests. Journal 
Educational Pedagogy, 1913, p. 227. 
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girls scored 13.8 points higher on the average than the slower girls. 
One would conclude from such a comparison that the relationship is 
negative. 

Table II shows all the correlations that resulted from our study of 
the question of the relationship between mirror-drawing ability and 
general intelligence. The correlation coefficients are negative in four 


TaBLeE II.—SHOWING THE CORRELATIONS BETWEEN MIRROR-DRAWING ABILITY 
AND GENERAL ABILITY 

















Number of cases Group Correlation 
| coefficient 
EE See ee Elementary school 24 
I, Cn ral de Sg cy ily Yo eo RO Elementary school — .38 
33 boys... .. DF ican sa wees ta wae High school — .09 
ee .O1 
PR eh Stl eee 2 ie Coe | High school 13 
EE eee 27 
ER og eh aineie aes | Educational psychology 17 
REESE ESS on ee aren a a eenen Ry | Educational psychology — .17 
a6 boye ane @mm..............55. | Bright group (test selected) | —.08 


| 





instances and the positive correlations are small. They indicate that 
there is no correlation of significance between mirror-drawing ability 
and general intelligence. 


RELATION OF MIRROR-DRAWING CAPACITY TO MUSCULAR SPEED 


The correlations obtained between the two motor speed tests— 
marking speed and making letters—and mirror-drawing scores are 
rather high for such performances, and seem to indicate that a relation- 
ship exists between them. (See Table III.) 


TABLE III.—CorRELATIONS BETWEEN MOTOR SPEED TESTS AND MIRROR-DRAWING 








: Correlation 
Speed test Group with mirror- 
drawing 

Marking speed...................| 60 high school students dl 
meme betters... ............26.. 60 high school students 26 
Markime epeed.................;. 100 high school students .38 
Making letters...................| 100 high school students .54 
I, oo. oot ee eee eee | 137 educational psychology 18 
ES is so ne ke awe 137 educational psychology 12 

















Nature of Mirror-drawing Ability 225 


The correlations between mirror-drawing, and marking speed and 
making letters tend to be higher than the correlations between mirror- 
drawing and such performances as marble sorting and ecard sorting. 
There are more common elements in the mirror-drawing performance 
and writing or marking than there is between mirror-drawing and 
marble or card sorting. 

For further comparison we correlated the mirror-drawing norms 
with the marking speed norms and letter making norms for the corre- 
sponding ages. (See Table IV.) 


TaBLE IV.—CorRRELATION OF MuscuLAR SPEED NORMS WITH MIRROR-DRAWING 














SCORES 
Correlation 
Speed test Group with mirror- 
drawing 
ER eo ee ee | Boys’ norms 91 
CE EE eee .98 
sos 6-50 Ae bea Ra Oe RNAS TERE CI Girls’ norms 84 
| FUT ETERS TLE ETT ETE .88 
{ 





Since the correlations between the norms for mirror-drawing and 
the two muscular speed norms are so high, we may assume that the 
ability to do the mirror drawing develops along with the muscular 
speed functions. 

Calfee! found the following correlations between muscular speed 
tests and mirror-drawing scores: card sorting, elementary school boys 
.26, college freshmen girls .20, college freshmen men .11, and in card 
dealing, elementary school boys .11, college freshmen girls .37, and 
college freshmen men .19. Schott? in a comprehensive piece of work 
with twenty-four bright high school seniors found the following corre- 
lations: Card dealing, .33, marking speed, .45, card sorting, .22, marble 
sorting, .18. 

The results of the two studies are borne out by my results. In 
Schott’s work he found a higher correlation between marking speed 
and mirror-drawing than he did between mirror-drawing and the other 
motor performances. 





1 Calfee, M.: College Freshmen and Four General Intelligence Tests. Journal 
Educational Psychology, 1913, pp. 223-231. 
2 Schott, E. L.: Doctor’s Thesis, University of Missouri, 1926. 
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Norms IN MIRROR-DRAWING, MARKING SPEED, AND LETTER MAKING 
FOR WHITE Boys AND GIRLS FROM Six TO SEVENTEEN INCLUSIVE 


The large number of subjects—991—to which we gave the tests 
enabled us to establish rather reliable norms for such performances. 
The ability in the various tests showed a wide range. We shall set 
forth the data for the different ages in the various tests. Table V 
shows the norms for the three performances. 


TaBLeE V.—MIRROR-DRAWING AND OTHER Motor Test Norms For WHITE 
Boys AND GIRLS FROM Srx To SEVENTEEN INCLUSIVE 






































Boys | Girls 
: | : 
Age N . menmer- Marks | Letters | Age N aaner- Marks | Letters 
drawing drawing 
6 38 4.7 119 54 6 28 | 2.9 112 52 
7 35 6.6 124 74 7 41 4.9 147 96 
s 49 9.1 153 123 | s 53 5.9 164 137 
9 27 9.4 160 150 | 9 28 8.3 193 176 
10 43 10.5 195 179 10 44 8.9 212 194 
11 30 14.9 203 191 | 11 : 49 10.9 217 207 
12 40 14.7 223 200 12 49 13.8 228 227 
13 32 15.8 243 228 13 37 16.1 249 250 
14 44 20.1 252 234 14 57 22.6 263 268 
15 34 22.4 264 241 15 61 30.7 270 272 
16 46 24.9 271 250 16 48 34.4 271 274 
17 49 30.8 285 268 17 49 38 .6 274 280 























The numbers at the different ages are large enough to give fairly 
reliable norms. At age twelve the boys fail to reach the mirror- 
drawing norm for the eleven year old boys. The twelve year old boys 
are only .2 lower which is not significant. It might be explained by 
the physiological changes that are taking place in the boy’s physique 
at about the age of twelve level, or by the indifference that boys of 
about twelve years of age manifest toward tasks. We find about 
the same thing happening in the case of the girls two years earlier. 
The norm for the ten year old girls is about the same as that of nine. 

Students were found in nearly all age ranges who could do practi- 
cally nothing with the mirror-drawing; whereas others experienced 
little difficulty with the task. For instance in the seventeen year 
group one boy was able to complete only two lines while another of 
the same age was able to complete one hundred and three lines. One 
girl in the seventeen year group was able to complete only three lines 


while another in the same range was able to complete one hundred and 
twenty-four lines. 
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TABLE VI.—SHOWING THE RANGE IN THE ABILITIES OF WHITE Boys AnD GIRLS BY 
AGE AND SEX IN MIRROR-DRAWING, MARKING SPEED, AND MAKING LETTERS 


























Boys Girls 

| 
pom N Mirror- Range Range Vv Mirror- Range Range 

| drawing | marking | letters 2 drawing | marking | letters 
6 38 O- 14 52-185 18-112 28 Oo- 9 70-188 18-108 
7 35 | 0-17 57-198 | 0-2041/| 41 0- 14 82-246 | 24-156 
8 49 0O—- 28 58-246 25-240 53 0O- 18 95-262 42-248 
9 27 0- 33 84-254 84-252 28 O— 55 114-256 66-288 
10 43 1-— 41 91-285 73-283 44 0O—- 33 89-288 74-279 
ll 30 1l— 41 108-262 | 120-294 49 1— 36 135-294 | 124-273 
12 40 3- 31 151-301 | 140-275 49 1— 46 114-338 | 132-324 
13 32 1-— 76 144-305 | 140-364 37 0O- 49 200-290 | 162-335 
14 44 l— 84 156-316 | 165-340 | 57 1— 84 165-382 | 188-406 
15 34 1— 84 169-347 | 174-368 |) 61 4— 82 181-360 | 192-438 
16 46 2— 80 200-354 | 175-344 || 48 3-113 204-382 | 186-354 
17 49 2-103 204-341 | 174-372 49 3-124 201-381 | 196-368 
































1The subject probably did not understand the directions since the next score in the range was 


twenty-three. 


TasB.Le VII.—SHOWING THE PER CENT OF IMPROVEMENT IN THE MIRROR-DRAWING 
Scores oF Wurre Boys AND Wuire GIRLS FROM YEAR TO YEAR 



































| 
: Boys | Girls 
t | 
2 | Per cent | Per cent 
Age Scores | improve- — Age Scores improve- 
a ment | ment 
4 | 
fi 6 ey Wee ei ., 6 2.9 
| 7 6.6 40 7 4.9 70 
j 8 — 91 38 8 5.9 20 
{ 9 | 9.4 3.5 9 8.3 41 
10 10.5 11 10 8.9 7 
11 | 14.9 42 11 10.9 23 
4 12 14.7 ~— .013 12 13.8 27 
| 13 15.8 8 13 16.1 17 
14 20.8 27 14 22.6 40 
. 15 | 22.4 11 15 30.8 36 
H 16 24.9 ll 16 34.4 12 
17 | 30.8 23 - 17 38.6 12 








These percentages are based on the number of cases shown in Table V. 
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In Table VII we show the per cent of improvement in the ability 
to do the mirror-drawing from year to year. There is improvement in 
each year except with the twelve year old boys. The per cent of 
improvement from year to year is much more regular with the girls 
than with the boys. 


CONCLUSIONS 


1. Mirror-drawing is essentially a trial and error motor learning 
process. 

2. The ability to do the mirror-drawing develops from year to 
year. 

3. All learning curves are more regular for girls than for boys, both 
in simple muscular speed tests and in the more complex motor-sensory 
coordination tests. 

4. Boys make higher mirror-drawing scores up to thirteen; girls 
surpass them in later years and at seventeen are much ahead. 

5. The years seven, eight, and nine are the years of greatest 
motor growth as indicated by the three tests employed in this study. 

6. There is no positive relation between mirror-drawing ability 
and general intelligence. 

7. There is a positive relationship between mirror-drawing ability 
and simple motor speed—marking and making letters. 

8. Age and sex norms are set out in this article for the ages from 
six to seventeen inclusive for mirror-drawing ability, for marking speed, 
and for speed in making letters. 

















STANDARD RESPONSE ERROR IN A MEASURE OF 
IMPROVEMENT 


EK. F. LINDQUIST 
State University of lowa 


In the situation where a fallible test is administered to a group of 
pupils before and after a course of instruction, it is sometimes useful 
to have a measure of the probable amount of error (due to the unre- 
liability of the test) in the difference between the initial and final 
scores of a single pupil, 7.e., a probable error of the gain in score of a 
single pupil. The formulas here derived will give varying degrees of 
approximation to this probable error. 

Let 


a = initial score on test 

b = final score on (same) test 

o, = standard deviation of initial scores 
standard deviation of final scores 


Op = 

Toa = coefficient of reliability of the test in its initial application 
Ton = coefficient of reliability of the test in its final application 

Ca = Gc\/1 — rag = Standard error of response in initial score 
op = oo\/1 — mg = Standard error of response in final score 
ra = correlation between initial and final scores 


o, = standard error in gain in score of a single student 
then 
Oo? = Ga2 + ov? — 27s oaos (difference between correlated measures) (1) 


by substitution of values of oa X o 
a,” = ga°(1 —_ Teas) a on°(1 = Top) om 2r at aor (1 = Toa) (1 _ Top) (2) 


and 


og = Vo2(1 — ras) + 002(1 — Ton) — 2raoaoe|(1 — Tas)(1 — Top)]* (3) 


Formula (3) is that which will yield the most accurate value of the 
desired standard error, but it is quite unwieldy, and useful approxima- 
tions to it may be more conveniently secured. 

If the variability of scores on the test is not significantly influenced 
by the course of instruction (a condition which would often be closely 
approached), 7.e., if c2 = os, then formula (3) reduces to 











oo = Fa (1 — ras) + (1 — Tox) — 2rasl(1 — raa)(1 — ron)]* (4) 
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If, in addition, the reliability of the test is the same in its final as in 
its initial application, 7.e., if rea = Tos (and og = o), then formula 
(3) reduces to 


Gp = Ga 2A(1 — raa)(1 — ras) = 1.414000 (1 — roa)(1 — ra) (5) 


a formula which may be conveniently applied. 


If only the second condition is satisfied, 7.e., if ra4 = To, but o. ¥ 
op, then formula (3) becomes 


oo = V(1 — Tea) (Ga? + 00? — 2rapo aor) (6) 
It may happen, in the situation where the effect of training is 
slight relative to the total development of the ability tested, that ra, 


closely approximates r in value. Assuming og = 00, faa = Ton = Tab, 
formula (3) becomes 





gg = 1.41404(1 — rea) (7) 


While formula (7) is most easily applied, since it requires computa- 
tion of only o, and rag, it involves three assumptions none of which 
would be completely satisfied and the cumulative effect of which is 
difficult to predict. It should therefore be used only after careful 
considerations of the assumptions involved. 
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NOTE ON THE RELIABILITY OF THE INDEX OF 
INTEGRATION 


JOHN W. DICKEY 


State Normal School, Newark, N. J. 


The writer has presented elsewhere! the formula 
K== (1) 


g 


as a measure of pupil integration within the public schools. It is seen 
to be the ratio of the mean of the pupils’ gross scores, made on a test, to 
the standard deviation of the same gross scores. 

In a footnote of that paper it was stated that errors are always 
present when using formula (1). It is the purpose of this paper to 
derive the standard error and the probable error formulas of equation 
(1). 

By writing the total derivative of equation (1), there results 

ak = 74M — Mac 
a? 

Squaring, summing and dividing by the theoretical infinite popula- 
tion, the differential equation becomes 
o*ou* + M*o,? — O 


4 


Or’ = 





o 


The third term in the numerator, is equal to zero because there is no 
correlation between the two independent variables, M and co. 

When substituting the values of cy” and o,”, and simplifying, the 
variance of (K) results in the form 





2o2 + M? 
Ox’ Ss — 
2No? 
whence 
V2 + K? 
 ——— 2 
OK 4/2N ( ) 
and 
0.6745+/2 + K? : 
PE, = —— -- 3 


Formulas (2) and (3) give the standard error and the probable 
error, respectively, of equation (1). 


1 An Index of Integration. Journal of Educational Psychology, Vol. XX, No. 9, 
Dec., 1929, p. 625. 
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NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


at EDUCATION em 


CONDUCTED BY FRANCES M. FOSTER 











Studies in Service and Self-control, by H. Hartshorne, M. May, and 
J. Maller. New York: The Macmillan Co., 1929. Pp. 556. 


This second volume in the series of presentations of the results of 
the Character Education Inquiry studies three traits: cooperation, 
persistence, and inhibition. 

Twelve tests of cooperation were devised, the five used in the main 
study (NV = 800) being: (1) voting part of a class cash prize to charity 
instead of to themselves (self r .51 and .65); (2) sharing part of a 
gift kit of pencils, rulers, etc. with children in other grades; (3) bring- 
ing in envelopes of jokes, pictures, stories for hospital children; 
(4) working as many problems per minute for class prize as for self 
(self +r = .90); (5) choosing to give certain sheets of problems to 
help class rather then self win a prize (self r = .78). Average 
intercorrelation was .16. 

Cooperation appeared related to: better reputation for service 
(r = .31); better cultural background (r = .19); being Protestant 
(Protestant score 116, Catholic 107, Jewish 113); less movie attendance 
(114 for never or once a week, to 108 for three or more times per week) ; 
higher intelligence (r = .16); older (r = .18); better school marks 
(r = .32); better deportment (r = .19); being a girl (113 to 111); 
healthier teeth, eyes and nutrition (r = .17); living in Walden, New 
York instead of in New Haven (117 to 111 or 108); belonging to supe- 
rior economic class (117 highest to 109 lowest); parents native born 
(114 to 109) ; being in a cooperative classroom (correlation average boy 
and average girl = .60, classmates at random = .12); having coopera- 
tive sibling (r = .42); having at least five years Sunday School (115 to 
110); attending Sunday School more than twice a month (114 to 110); 
being a member of boys or girls club (115 to 113); being among the 
least rather than the most suggestible (124 to 110 for extremes, 
although r’s run, .25, .00, and .00); rated as clean in physical examina- 
tion (118 to 116); having friend in same class who is cooperative (.29) ; 
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and not being anail biter (117to115). Cooperativeness appeared unre- 
lated to emotional stability as measured by questionnaires, to economic 
status as measured by the Sims scale, to having parents who cooperated 
by filling out an information blank, to parental intelligence, length of 
attendance at school tested, attendance at summer camps, or tendency 
to be well liked by classmates. 

Of nine tests of persistence the following five were used in the main 
study: (1) time worked at puzzling out the remainder of a story, badly 
pied, whivh has been read to climax (self r .75 for two stories) ; (2) time 
worked at solving practically impossible magic square; (3) same for 
cross and ring puzzle; (4) difference between number of additions done 
in first two minutes and in last two minutes of a long work period, 
motive to win prize for self (reliability .88); (5) same as (4) but moti- 
vated by class reward (reliability .92). Average intercorrelations .24. 
Reliability of battery .89. 

Persistence appeared related to reputation for persistence (r = .23); 
health (r = .11 but more agreement at extremes); age (r = .33); 
inhibition (r = .22); grade, retardation, intelligence with age constant 
(r = .29); deportment marks (r = .19); being a girl (96 chances out of 
100 of being significant); emotional stability (r = .15 but curvilinear 
relationship with more instability at both extremes); not being a nail- 
biter (Score of 157 to 117 for nail biters) ; living in a congested section 
of New Haven; low occupational status in the best residential section 
or high status in the poorer section or small town; cultured home 
(r = .18); having foreign born parents (73 per cent excel median child 
of native born) ; being Catholic (180 to 156 or 150); having a persistent 
sibling (r = .41); being in a persistent class (r average boy and average 
girl .74); length of attendance at Sunday school (r age constant = .17 
with superior scores for those never attending); being less suggestible 


than average (r intelligence constant, = .14); being liked; having per- 
sistent friend in same class (r = .31); being able to work many prob- 
lems in 24 minutes (r, age constant = .15). 


Persistence appeared unrelated to school marks; physical condition 
measures; socio-economic status; parental intelligence (excépt, per- 
haps in poorest community where r = .30); years of attendance in 
same school, age constant; belonging to club; attending camp; movie 
attendance (although once or twice a week seems superior to more or 
less); cleanliness (although extremes seem more persistent); having 
persistent friends not in same class, or being in majority sex for class- 
room (although in poorest population r was .85). 
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Six group tests, six party tests, and seven individual tests for inhibi- 
tion were developed, the four used in the main study being: (1) ability 
to add, resisting distractions in the form of pictures around the margin 
(reliability .89) ; (2) following directions not to touch a toy safe on the 
desk; (reliability .50) (3) following directions not to touch a set-up 
of five small puzzles on the desk (reliability .50); (4) not tearing open 
a folder to read the ending of an exciting story (reliability .48). Inter- 
correlations averaged .16. Reliability of battery .80. Average inter- 
correlation among individual tests was .23, on party tests .03. 

Inhibition, so measured, was correlated with: reputation for inhibi- 
tion (r = .40); being a girl (girls exceed boys by 5.5 SD’s); having 
better physical condition (r = .13); having higher cultural status of 
of home (r = .18); having well inhibited sibling (r = .25); having 
intelligent parents (r = .18); being in a well inhibited classroom 
(r average boy and average girl = .44); having well inhibited friend 
in same class (r = .20). No relationship appeared to age, grade, 
marks, intelligence, emotional stability, type of community, occupa- 
tion of father, socio-economic rating, nationality of parents, religion, 
years of attendance at same school, Sunday School attendance, club 
membership, camp attendance, movie attendance, suggestibility, score 
of friends in other classes, cleanliness, sustained speed at adding, or 
being in the majority sex. 

The data suggest many diverting speculations. To test all of the 
three traits with a validity (correlation with infinite number of situa- 
tions) of .90 would require one hundred situations instead of the four- 
teen used and which required thirty pupil hours. Obviously testing 
all of character at this rate will prove tedious. Short cuts are needed 
and may appear, it is hinted, in the forthcoming third volume. 

A second observation concerns the varing results obtained with 
the three populations studied. Almost anyone, testing two hundred 
fifty children in a typical town would feel a fair confidence in his 
results. Actually surprising discrepancies ‘appear. A correlation of 
.00 in Walden may be paralleled .40 or even .80 in New Haven groups, 
or vice versa. Groups differ by more than a sampling error when 
character patterns are studied. 

Perhaps there has yet been no more significant contribution to the 
study of character. Character appears to vary with social, not with 
biological factors. Character patterns are often group patterns, not 
discoverable in the individuals who make up the groups. The study 
of delinquency has become generally recognized to be more usefully 
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approached through companions and opportunities rather than 

through neurones and in like fashion, treatment of these more carefully 

defined behavior traits appears to be a matter of mores rather than 

medicine. GoopwIn WaTSON. 
Teachers College, Columbia University. 





The Art of Interrogation: Studies in the Principles of Mental Tests and 
Examinations, by E. R. Hamilton, with an introduction by C. 
Spearman. New York: Harcourt, Brace and Co., 1929. Pp. 
XII + 174. 


The subtitle of this book is more descriptive of its real nature than 
is the main title. The main title leads us to expect a book on the art 
of asking questions as a method of teaching in the classroom, and pre- 
pares us for a rather concrete and practical type of material. This 
impression is misleading. The book is of an abstract and philosophical 
type, and is devoted almost wholly to the criticism of mental testing 
technique. 

The seven chapters in the book are entitled ‘‘A Description of 
Mind,” ‘‘The Way of Mental Tests,” ‘‘ Measurement in Psychology,” 
‘Examinations Old and New,” ‘‘The Testing of Knowledge,” “Ques- 
tions in the Making,” ‘‘ Questioning in the Classroom.” 

The author gives the words questioning and interrogation a rather 
broad meaning, making them include the exercises, problems, and 
tasks which often make up the bulk of intelligence tests and achieve- 
ment tests. The preparation of an intelligence test is therefore an 
application of the art of interrogation as the author uses the word. 
The discussions of mental tests and of measurement in psychology, are 
addressed mainly to the interests of the test makers rather than to the 
interests of classroom teachers. 

The treatment of examinations otherthan intelligence tests is a some- 
what elementary one and seems to have been designed primarily to call 
attention to the importance of substituting objective exercises for the 
usual essay type. There are some helpful suggestions and interesting 
criticisms of the nature of the measurements accomplished by some 
of the objective types of tests. 

The last chapter, on “‘ Questioning in the Classroom,” impresses the 
reviewer as being the weakest of the seven. In fact it seems almost to 
have been an afterthought, possibly suggested as an addition by some- 
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one who read the original manuscript before its publication. It gives 
the impression that the writer is not so much interested in instructional 
technique as in mental measurements. 

While the book does not impress the reviewer as being one which 
will find extensive use in the training of educational workers in Ameri- 
ca, it has two distinct merits which should be mentioned. The first 
is that it is a British contribution, uses quite a bit of terminology which 
is new to American readers, and in general presents a view of psycho- 
logical measurements from an angle not found in American books. 
The second advantage is that the author has gone into his analysis in a 
very thoughtful manner and with a keenness of criticism which makes 
his readers re-examine many things which have previously been classi- 
fied as commonplaces in the measurement field. 

| C. C. CRAWFORD. 
University of Southern California. 





The Mendenhall-Warren-Hollerith Correlation Method, by Richard 
Warren and Robert M. Mendenhall. New York: Columbia 
University Press, 1929. Pp. II + 36. 

Study Manual in Elementary Statistics, by Everett F. Linquist and 
George D. Stoddard. New York: Longmans, Green and Co., 
1929. Pp. VII + 109. 

Statistics for Beginners in Education, by Frederick Lamson Whitney. 
New York: D. Appleton and Co., 1929. Pp. XVI + 119. 


It is satisfying to notice increasing efforts to facilitate statistical 
computations and to initiate students of education and psychology 
into an appreciation of statistical concepts. The three books named 
above are representative of, and contribute significantly to, at least 
one of these ends. 

The brief manual by Mendenhall and Warren is a description of 
their adaptation of method by which the Hollerith machines, viz., the 
sorter, the printing tabulator, the duplicating keypunch, and the veri- 
fier, together with adding and calculating machines, can be utilized in 
computing coefficients of correlation. Some fragmentary information 
regarding this process has been available, but the manual is reasonably 
complete. The method involves the general steps of: (1) compila- 
tion of original data sheets; (2) the data punched upon cards; (3) 
verification of punched cards by means of a mechanical verifier; (4) 
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sorting and tabulating the cards by machine; (5) computing the cumu- 
lative totals by means of an adding machine; (6) checking of additions; 
and (7) solving by means of the Mendenhall-Warren-Hollerith form 
with the aid of a Monroe calculator for means, sigmas, and correlation 
coefficients, of both zero and higher orders, together with appropriate 
checking of arithmetical calculation. The procedure utilizes the 
Pearson Products-Moment formula and is especially economical when 
a large number of intercorrelations are to be determined. Cuts are 
shown in which the general lay-out of the statistical bureau is illus- 
trated and also the machines used are pictured. The cost of such 
elaborate equipment and the unavailability of experienced machine 
operators are the practical objections which can be raised against the 
proposed program. However, for large surveys and extensive research 
programs the system described appears to be very efficient and reduces 
to practicable limits the arduousness and time and cost requirements 
of calculations which are based upon a large number of cases. It is 
possible to foresee the establishment of such bureaus as more or less 
commercial agencies through which mechanical treatment of data 
can be completed for institutions or individuals at a reasonable cost. 
The procedure as advocated is an effective and practicable adaptation 
of elementary statistical computation to machine technique. 

The ‘‘Study Manual” of Lindquist and Stoddard is designed to aid 
the student to develop “statistical judgment.’”’ The authors state, 
and in part they are correct, that present courses in statistics emphasize 
computations. Appreciation, use, application, and knowledge of 
limitation of statistical concepts are certainly of first importance. 
The Manual is designed to develop such appreciation when used in 
connection with texts such as Garrett, Rugg, Odell, Holzinger, Mills, 
Thurstone, Kelley, and others. By exposition, illustrations, examples, 
problems, and questions formal text material is elaborated and applied. 
The material studied ranges from a study of the significance of numbers 
to the final topic which treats of partial and multiple correlation. It 
is the judgment of the reviewer that the ‘‘Study Manual”’ will aid sub- 
stantially in developing understanding and appreciation of statistical 
concepts. 

‘Statistics for Beginners in Education’’ has been written for those, 
whether now in educational work or in preparation for it, who have 
had no training in statistics. It-is a primer or even a “kindergartner;”’ 
as such it merits unusual recognition. It is not concerned with sta- 
tistical computations; it aims merely to illustrate and to state the 
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meanings of certain selected statistical terms. Another unusual 
feature is its success in integrating these concepts into the very method 
of school room practice and in making clear their functions as necessary 
educational tools. The elementary nature of the book is illustrated 
by oversimplifications which to an erudite statistician might be 
considered gross errors. For example, no distinction is drawn between 
the true mode and a crude mode; or again there is the implication 
that in the variability of a quantity the range of plus and minus 
one probable error includes all such variation (p. 103). There are other 
oversimplifications. However, the purpose of the book might not 
have been achieved so well had all the necessary qualifications and 
explanations been included. The book merits praise as an intro- 
duction to statistics in the field of education because of its simplicity; 
its style and organization are well considered; its illustrative material 
succeeds. Its academic faults should be corrected in subsequent 
texts or courses. It should be valuable to students in education to 
help them to understand the concepts which they meet in their reading. 
Those who are about to begin a formal study of statistics may find 
in it a simplicity and clarity that may reduce the shock which is not 
uncommonly encountered in the first semester of a substantial course 
in the abstractions and mysteries of pure and applied statistics. 
EpwIn Maurice BaILor. 


‘Dartmouth College. 
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